Thursday 15 December 2011

Stable systems leave us unprepared for incidents

Many years ago I worked on the shop floor of a national retailer. When the tills failed for one reason or another, there was a manual process that had to be quickly rolled out. Out came the pocket calculators, hand-written receipts and manual credit-card imprinters. At the time, this was not an uncommon occurrence and all the staff consequently knew what they had to do. The process took a bit longer but we were quite sleek at keeping the traffic moving through the shop, even the time it happened the Saturday before Christmas

Nearly 20 years on and I'm not sure that this would necessarily still be the case. As the IT supporting these services becomes more stable, the instances of outages happen less often and there is less working knowledge of what needs to be done when a failure occurs. Only through training and practice can businesses be sure that their staff know what to do in the event of an incident. Without this, organisations risk losing business due to not being able to sell their goods and services at the time when people want to buy them. The expectations of customers to be able to buy what they want when they want to and be processed as fast as possible are certainly far greater now than they were in the early nineties, and there are more alternative options now for them to make their purchase.

It was an article in The Register which made me consider this as a topic to cover. Although not a recent finding, the article comments on the outcome of the investigation into the crash of Air France flight 447 in 2009 which concluded that after a failure of the autopilot, the pilots did not have sufficient skills and experience to fly the plane manually. This issue resulted in the fight plunging into the Atlantic ocean with the tragic loss of all 228 people on board. The report highlights that as pilots become so dependent on the autopilot, using it for many of the tasks in the flight, that when it is suddenly and unexpectedly not available to them that skills to pilot a plane the "old fashioned" way, may be somewhat rusty.

This highlights the importance of incident training and business continuity exercising. A business continuity event or crisis is something that no business wants to think will happen to it but as I've mentioned in previous posts, there are many external and uncontrollable factors that can introduce this scenario. Don't just test IT failover or run the generators... Test and exercise the people who will be expected to take the reigns, assume "manual control" and make difficult decisions in a short time-frame that may ultimately save costs, reputation and in many cases... lives.

Image: bk images / FreeDigitalPhotos.net

No comments:

Post a Comment