Thursday 6 October 2011

Availability: More than technical resilience

It's easy to let the definition of information security controls become biased in terms of just the IT and not the data or wider assets. The IT, although an asset in itself, is very much a supporting tool to store, process and transmit the data assets. This includes your data and your customers' data and should be classified according to it's value and the impact of loss, unauthorised access or change. Data will include general company confidential data, intellectual property, personally identifiable information or data which if exposed may harm the competitive advantages of the business. Confidentiality is always a key concern for data and availability is typically left to the technical resilience of the IT systems in which the data is contained.

It's only when you look at assets other than the data and consider how to protect those and how they interact with the data that you need to think about more than IT. Other assets include people (staff & contractors), buildings, supporting services/utilities and the reputation of the company. Reputation is always a difficult one when you need to quantify the level of impact of any event, but you can certainly think about the type of events you want to mitigate against to keep customers happy and ensure potential customers want to do business with you. How you deal with the risks against other assets will in turn help to protect this one.

Having systems backed up or having a failover system is all well and good but how do people access that data if their normal routine is disrupted or if they cannot physically access the location where they normally connect to the systems and are not aware of alternative options.

IT failings are only an element of business continuity events. Common causes include business locations being inaccessible or staff journeys being impacted, predominantly for reasons outside of the control of the organisation, such as bad weather. Whenever it snows in the UK, the country tends to grind to a halt, either through inaccessible transport routes or through staff not wanting to risk travelling or not knowing how to handle the conditions properly. Every time we get more than an inch of snow lots of people will ask why we weren't more prepared and why we didn't learn from last time. Someone will typically relay a story about how the last time they flew to Calgary (or similar location) there were fleets of snow ploughs constantly clearing runways and ask why don't we have the same at Heathrow? For most observers, the answer is obvious, Heathrow only gets disrupted a handful of times a year by snow and typically recovers quickly. Calgary however is somewhere people go because they have so much snow and therefore they are far more likely to invest in controls and infrastructure to keep planes landing in all conditions. Where snow is something of an irregular irritation for Heathrow it is an indication of the prime business environment for Calgary, so well worth the investment in the fleet of Ploughs (plus sweepers, blowers and melters).

Industrial action is another continuity consideration. We hear how a day of strikes have caused "so-many-millions of pounds of lost business", well this shouldn't need to be the case. If the only way for staff at a company to continue working is to jump on a train or tube and go to a specific building then that's a significant requirement you are putting onto another organisation with which you have no contracted service levels. If a strike takes out this one and only access mechanism and the only backup options are full to capacity from other companies' staff impacted by the same event, then you leave yourself unprepared.

A Business Impact Analysis (BIA) of all elements of the business will help assess each activity and how the loss of it for different periods of time might impact the business as a whole. Typically the technical roles within the company will know what they need to do if the requirement comes to connect remotely and more often do this on a regular basis. It's the functions that are often considered to be back-office that are perhaps less ready for a continuity event. Functions like procurement, billing and payroll are traditionally office-based activities, working from desktop PCs with data maintained locally. Although a BIA might indicate a minimum disruption to the business if these functions were not able to work for a day, the requirements for a company to buy goods and services, bill their customers and pay their staff become more critical for longer more protracted outages. Suddenly, without these functions, other measures that organisations have in place for resilience can be affected. Automation and remote management capabilities for these functions are all well and good but if they are only going to be used in an emergency, how do you make sure that the people in question know what to do and when?

Exercising in Business Continuity is just as important as any other availability control. There's no point in implementing continuity measures if nobody knows that they are there or what they need to do with them. Any business will hope that it never needs to use its continuity measures and the chances are that any event may happen long after the measures are implemented. Keeping the measures up-to-date and making sure they technically work is one part of testing. Exercising the people is another. These tests should be defined as part of a wider crisis management plan to test against multiple scenarios.

In conclusion, technical resilience is only a small part of ensuring the availability of services and data. An understanding of the criticality of each business area and the impact the loss of any of them for varying periods of time should be understood. Once controls are implemented to reduce the risk or impact of different events, both the controls themselves and the people required to operate them in the event that they are required.

Photo: think4photop

No comments:

Post a Comment