Thursday 27 October 2011

Event Correlation: There is no such thing as BAU

BAU or Business as Usual is a term that is used to define a number of different things depending on the nature of your business. For projects going through new implementations or changes, the progress through transition, transformation and testing ultimately leads to the point where it is supported by the normal business and technical management processes that will keep it going until the next major change. More generally, BAU is used to describe the steady state of any process, service or infrastructure, the point at which there are no exceptional changes or problems and where it can easily "tick over" in the same way day after day.

BAU can however lead you into a false sense of security. Achieving a steady state of operation should make it quicker and easier to identify issues which arise. However, problems occur when you start to consider regular issues or low-level low-impact incidents which occur on a daily basis, as normal or part of the BAU operation. Once you do that, you may be ignoring the signs of a larger problem which is bubbling away under the surface.

There are a number of shows on TV now that dissect significant incidents and disasters to examine how they were caused. Typically, these incidents are things that are in the public conscious, were heavily reported in the news at the time and either threatened or took lives. Incidents such as plane crashes, train accidents, ferries sinking or industrial accidents of some kind are all subjects of these shows. The key point made by all of these programmes is that these things don't just happen without any warning signs and cannot be attributed to a single issue or failing. These types of incident are a chain of events which have come together to cause a far more significant incident or disaster. The reason that these issues are not identified in time to prevent a disaster is that they are each only visible to different people, have no correlation or visibility in a holistic fashion and more often are not considered to be issues because they are just things that happen as part of Business as Usual.

For a plane crash, the programme wil talk about a number of minor factors which could contribute to an accident: the maintenance team not following proper procedures in order to get their job done quicker, the ground crew who ignore an issue with the plane, the fuel truck driver who incorrectly tries to convert litres to gallons, the pilots who don't get enough sleep and are not fully alert, the air traffic controller working long hours with too many planes to mange, the airport with out-of-date equipment to facilitate landings, the company that transports dangerous materials on the flight without appropriate controls or the airline that pushes for faster turnarounds to make or save more money. These are all typical findings but are all either treated as normal events and not given the visibility at a level that can assess the overall risk to the flight itself. It's not until after the event does someone (typically the team investigating the crash) put all the pieces together to lead up to the event. By then it's too late.

The same applies in information security. Events don't just occur and incidents don't happen without warning, however there are often minor issues which are ignored as "acceptable failings" such as the patches that don't get applied in time, the ongoing virus detections which are quickly handled by the AV and not investigated, that one ID that always seems to log failed access attempts, the documentation not completed during changes as it holds the process up too much and demands from customers and management to respond faster and achieve more in less time. On their own, these are things that may just be treated as BAU occurrences, but may actually be symptoms of a larger problem bubbling away under the surface. The only way you're going to identify the true risk posed by the aggregation of these events is to firstly have visibility of them and secondly to understand how these individual issues might ultimately cause a larger problem. This is where event correlation is important.

There are plenty of options for Security Information and Event Management (SIEM) tool sets to correlate event data from the many technical sources within your environment. The signature of a security event can comprise information from many sources in the network which individually may not seem significant. However, SIEM tools are only part of the solution and although they can sift though potentially millions of alerts and log entries to give a concise and actionable picture of technical events, this then needs to be combined with other information to give you a correlation at a higher level. Process failings and incidents which are not detected through technical measures are also elements which can contribute to a security incident and it may be that a low-level correlated event from your SIEM system, combined with additional information gathered externally, indicates a more significant threat that you are facing. Security management standards such as ISO 27001 define the importance of measuring the effectiveness of all you security controls, not just the technical ones, as an ineffective manual or procedural control can just as easily contribute to a security incident. The human element can not only be the weakest link but is typically also performing the types of controls where failings and effectiveness shortfalls are far more difficult to detect due to no technical monitoring being in place.

The upshot is that it is important to know your operating environment and have an overall view of both minor incidents that may currently be treated as 'normal' as well as the effectiveness of all your controls, both technical and procedural. Only by being able to correlate the risk of each event and though an understanding of how even if individually the risk of each one is negligible, the combined risk is perhaps intolerable, will you be able to predict and prevent the big incidents or disasters.

Photo: David Castillo Dominici

Thursday 20 October 2011

2012: Thinking beyond The Olympics

The London Olympics next year is very high in the public conscious and will continue to be so. The increase in traffic and the number of people using many services including public transport means that those businesses in central London are already gearing up for potential disruption and putting plans in place to reduce the impact of the event. This is certainly a good move for companies in the middle of town but what about the rest of us? If you don't have a significant central London presence then you're probably not too concerned. However, here are some other factors to consider:

Geography: other sites and transport links

It's probably fair to say that central London is going to be worst affected in terms of traffic but there are a number of locations outside of the M25 which will also be holding events. Consideration needs to be given to where spectators will be travelling from and by what means. Spectators from other countries may not spend their entire time at the games, so tourist areas around the country, particularly in the South East will see an increase in visitors. If you're doing business in areas of interest to tourists in general, have you considered the potential impact?

Timeline: when does your plan start and end?

What timeframe do you use when planning for any disruption? Do you start and end at the opening and closing ceremonies or consider inclusion of related events such as the Olympic torch relay and the Paralympics? From the torch relay arriving in the UK on 18th May to the closing ceremony of the Paralympics on 9th September is over 100 days. If you've only planned for three weeks then maybe it's time to rethink?

Suppliers and Customers: secondary and tertiary elements

Your business may not be based in central London or even have a significant presence in and around the capital. What about your suppliers and other third parties on which you rely? Having dealt with the primary consideration of your own business, you then need to consider the secondary impact of your suppliers not being able to fulfil their obligations to you. I've discussed previously about the issues of companies relying on trains and the tube to get their staff into work. These are third parties over which you have no control and no agreed levels of service. It is important to make sure therefore, that those third parties with which you do hold such contracts have considered how they are going the continue to provide the required levels of service throughout the games. The tertiary elements are the suppliers to your suppliers. For key third-parties, it's one thing to have them guarantee a level of service to you but can you be sure that they have undergone the same level of due diligence for their suppliers? The secondary consideration is ensuring your suppliers can still provide services to you, the tertiary consideration is ensuring that they are also considering the same risks to their businesses.

Your customers are the other group with whom you have contracts and to whom you have committed an agreed level of service. As well as making appropriate plans during 2012 for your own benefit, customers will want to see that you are also considering the continued provision of service to them. Where that service relies on infrastructure that will be under increasing demand and pressure during the Olympics, the basis of this requirement is well-founded.

Staff attendance: has everyone got tickets?

How many of your employees successfully obtained tickets for an Olympic event? How many have tickets for the same event? You may not get an idea until the opportunity to book time-off for 2012 comes around, but even then you will have people who wait until closer to the time to book their holiday even though they've had their tickets booked and the dates known for over a year.

Minimum notification periods, the ability of management to reject holiday requests and the threat of disciplinary action for taking time off without approval may not seem as important to some individuals as the seemingly once in a lifetime opportunity to be at the London Olympics. You may know how many staff have requested leave on a particular day but how can you be sure until the day arrives and people turn up (or not)? Being ready for staff shortages during key events is important. You may feel comfortable in the knowledge you can discipline or even dismiss those who deliberately do not turn up for critical duties, but that doesn't help you on the day.

Other events in 2012

The olympics, in isolation, is going to be disruptive enough but don't forget all the regular and special events that happen throughout London that might just add an additional level of complexity and concern to an already busy summer. The Queen's Diamond Jubilee earns us an extra bank holiday in June and will include a number of events around the capital and Wimbledon attracts plenty of crowds and will do the same again across June and July. The Notting Hill carnival promises to once again be bigger and better in 2012 than previous years and there are plenty of festivals and other events across the capital which will help contribute to the mayhem.

Now is the time to act!

Don't leave it until the last minute to prepare yourself for next summer. Act now to make sure you're ready for possible disruptions:

- Think about which of your suppliers may be affected by disruptions
- Consider which critical suppliers you want to approach to discuss their plans to deal with how any disruptions may affect their supply chain.
- Consider how disruptions in London may affect your customers
- Find out who in your company is planning to take time off, well in advance.
- Consider the timeframe you want to plan for. How does this match up to critical times in your own business processes?

Photo: xedos4

Thursday 13 October 2011

Do I trust you with my most precious asset?

Getting security into the mindset of others and helping them to appreciate the benefits that security provides from the outset can be difficult. As you may already have seen, I do like to use analogies to put security into the context of a subject that others are familiar with. For service providers and outsourcers, a useful example is to look at potential customers as parents who are choosing a school for their child. Having been through this process recently myself, I've been able to draw some useful comparisons to with similar examples in business.

For parents choosing a school for their child, particularly the first school for their first child, this can be quite a daunting experience. Your child is your most valuable asset and for the most part you have been their primary influence, been responsible for defining every facet of their existence and for making every choice in their life. You have had complete visibility of everything they do and how each experience has made them the person they are today. The time has come however to entrust part of that responsibility to someone else, someone you don't know and of whom you have little visibility of how well they will continue the work that you have started. You can read reports from Ofsted (the schools inspector) who will define how well the school is performing against others in the area, you can look back into the history of the school and its performance and the exam results it has produced. You may speak to parents of children already in the school to see what they think of it or perhaps listen to rumours and stories in the local conscious about the school.

There is certainly a lot of information about to help parents make a judgement on a school for their child but the key element for many is the point when they get to visit the school and meet the head teacher. Until this point, the information gleaned has either been based on empirical results or second hand information from others. Seeing the environment for yourself and meeting the head teacher and most likely other members of staff will be the first opportunity to form your own opinions and ask your own questions. For many, this may be the differentiator and may be the biggest factor in how you make your decision. The figures will show how good the results are, but parents still want to know how their child will be engaged with and treated during their time at the school and will also want to know that they will be protected from harm and any risks that their children may face during their time there. Parents also have varying preferences for reporting and information. Some will be happy to entrust their child to the school with complete faith and rely only on report cards and parent-teacher meetings to get feedback. Other parents will want to know the minutiae of how each lesson is taught. Much of the assurance parents are going to get from this meeting is their confidence in the school and the staff to keep their child safe and secure in an environment outside of their control. Only if information provided in these meetings is to the satisfaction of the parents will they make the decision to entrust their child to the school.

This is very similar to businesses outsourcing elements of their IT or moving to a managed service. At a time when many companies are feeling the pinch from the economic downturn they may be turning to outsourcing to save costs and for many this may either be their first time or be the most significant move to entrust their data to another organisation. Without anywhere near as much information and visibility of prospective providers as they have of their own company, they will seek to gain as much information and assurance as possible. Analysis of the providers' performance and capabilities will give potential customers a good baseline on a shortlist. The really valuable information to differentiate a service provider from its competitors will come from the detailed and specific information that is exchanged throughout the bid process. Assurances that the supplier can deliver the required solution to the required standards will be a key measure, along with the cost effectiveness of the work. Customers also want assurances which may not be as easily set in stone as the technical design and cost. Making sure that their data is available when they need it will be defined within SLAs and recovery objectives but how can they be sure that the measures that make their information so highly available to them and their customers won't make it available to any unauthorised parties?

Security assurance can often be given in terms of certifications or accreditations held and the results of audits conducted. A visit to a service provider's facility will give customers peace of mind that the physical security controls are sufficient to protect their data, as well as the required environmental and power resilience controls. Customers will bring their own security people to talk to the service provider, and whilst the IT people are discussing how many megabits-per-second they need to transmit their data, gigabytes of RAM to process it and terabytes of disk space to store it, the security people will want to make sure that the data is being transmitted, processed and stored in an appropriate manner with controls that suitably mitigate the risk. The security people need to give assurances to their business that the data they entrust to the service provider is safe and properly protected from threats. This can be a make-or-break factor in any deal and will often require more than just assurances of compliance with any mandated regulatory standards. The security person that the service provider includes in that meeting needs to be able to give the customer the assurances they need that their data will be in safe hands and will be treated with the same care and consideration as if they had maintained it in-house.

Security built into a solution from the start is therefore more than just a technical solution consideration but requires a full risk-focussed assurance role to give potential customers the confidence they need that not only their service but their data will be safe in the service provider's hands.

Photo: Arvind Balaraman

Thursday 6 October 2011

Availability: More than technical resilience

It's easy to let the definition of information security controls become biased in terms of just the IT and not the data or wider assets. The IT, although an asset in itself, is very much a supporting tool to store, process and transmit the data assets. This includes your data and your customers' data and should be classified according to it's value and the impact of loss, unauthorised access or change. Data will include general company confidential data, intellectual property, personally identifiable information or data which if exposed may harm the competitive advantages of the business. Confidentiality is always a key concern for data and availability is typically left to the technical resilience of the IT systems in which the data is contained.

It's only when you look at assets other than the data and consider how to protect those and how they interact with the data that you need to think about more than IT. Other assets include people (staff & contractors), buildings, supporting services/utilities and the reputation of the company. Reputation is always a difficult one when you need to quantify the level of impact of any event, but you can certainly think about the type of events you want to mitigate against to keep customers happy and ensure potential customers want to do business with you. How you deal with the risks against other assets will in turn help to protect this one.

Having systems backed up or having a failover system is all well and good but how do people access that data if their normal routine is disrupted or if they cannot physically access the location where they normally connect to the systems and are not aware of alternative options.

IT failings are only an element of business continuity events. Common causes include business locations being inaccessible or staff journeys being impacted, predominantly for reasons outside of the control of the organisation, such as bad weather. Whenever it snows in the UK, the country tends to grind to a halt, either through inaccessible transport routes or through staff not wanting to risk travelling or not knowing how to handle the conditions properly. Every time we get more than an inch of snow lots of people will ask why we weren't more prepared and why we didn't learn from last time. Someone will typically relay a story about how the last time they flew to Calgary (or similar location) there were fleets of snow ploughs constantly clearing runways and ask why don't we have the same at Heathrow? For most observers, the answer is obvious, Heathrow only gets disrupted a handful of times a year by snow and typically recovers quickly. Calgary however is somewhere people go because they have so much snow and therefore they are far more likely to invest in controls and infrastructure to keep planes landing in all conditions. Where snow is something of an irregular irritation for Heathrow it is an indication of the prime business environment for Calgary, so well worth the investment in the fleet of Ploughs (plus sweepers, blowers and melters).

Industrial action is another continuity consideration. We hear how a day of strikes have caused "so-many-millions of pounds of lost business", well this shouldn't need to be the case. If the only way for staff at a company to continue working is to jump on a train or tube and go to a specific building then that's a significant requirement you are putting onto another organisation with which you have no contracted service levels. If a strike takes out this one and only access mechanism and the only backup options are full to capacity from other companies' staff impacted by the same event, then you leave yourself unprepared.

A Business Impact Analysis (BIA) of all elements of the business will help assess each activity and how the loss of it for different periods of time might impact the business as a whole. Typically the technical roles within the company will know what they need to do if the requirement comes to connect remotely and more often do this on a regular basis. It's the functions that are often considered to be back-office that are perhaps less ready for a continuity event. Functions like procurement, billing and payroll are traditionally office-based activities, working from desktop PCs with data maintained locally. Although a BIA might indicate a minimum disruption to the business if these functions were not able to work for a day, the requirements for a company to buy goods and services, bill their customers and pay their staff become more critical for longer more protracted outages. Suddenly, without these functions, other measures that organisations have in place for resilience can be affected. Automation and remote management capabilities for these functions are all well and good but if they are only going to be used in an emergency, how do you make sure that the people in question know what to do and when?

Exercising in Business Continuity is just as important as any other availability control. There's no point in implementing continuity measures if nobody knows that they are there or what they need to do with them. Any business will hope that it never needs to use its continuity measures and the chances are that any event may happen long after the measures are implemented. Keeping the measures up-to-date and making sure they technically work is one part of testing. Exercising the people is another. These tests should be defined as part of a wider crisis management plan to test against multiple scenarios.

In conclusion, technical resilience is only a small part of ensuring the availability of services and data. An understanding of the criticality of each business area and the impact the loss of any of them for varying periods of time should be understood. Once controls are implemented to reduce the risk or impact of different events, both the controls themselves and the people required to operate them in the event that they are required.

Photo: think4photop