Survival
Skills
No-one's ICT systems are immune from disaster– whether it's
a terrorist attack or a burst water main. But there are steps you
can take to ensure you survive whatever fate throws at you
We are all familiar with disaster – whether
we’re reading the tragic headlines about another devastating
storm or thrilling to the spectacle of Independence Day. Yet despite,
or perhaps because of, our constant exposure to calamity, few of
us believe a disaster is ever going to hit us, or our organisations,
personally.
"Not only could you lose valuable
records and the ability to conduct your business, you could
lose your reputation and face legal worries" |
While there’s not much likelihood of aliens
annihilating your datacentre or (in the UK at least) of a tornado
taking out your corporate headquarters, every day hundreds of businesses
suffer unscheduled downtime. A recent survey commissioned by Microsoft
found that UK companies could be losing as much as £63 billion
a year due to unplanned outages of their ICT systems. Two thirds
of the 400 businesses questioned by researcher Vanson Bourne on
behalf the software giant at the end of 2004 said they had experienced
downtime in the past two years, with two-thirds of these suffering
at least once a quarter.
While
most large corporations have some form of Business Continuity Planning
(BCP) in place, the same survey found that many small and medium-sized
enterprises were woefully under-prepared. The problem is particularly
acute in organisations employing between 200 and 500 people, where
only one in five uses external backup or support services.
The fact is, every company, however large or small, needs to consider
carefully what the impact would be of losing some or all of its
ICT systems or data (either temporarily or permanently) due to
an unforeseen disaster, and to make sure it has appropriate procedures
in place to minimise any negative impact on the business. ICT forms
just part of any overall BCP strategy, but it is a highly critical
component and deserves to be examined in detail.
Of course, the definition of what constitutes a disaster is broader
than that which springs immediately to most people’s minds.
Terrorist attacks, while garnering unprecedented levels of publicity
in recent years, are fortunately still few and far between. Far
more common, though, are thefts of equipment, hardware and software
failures, hacking and virus attacks, power outages, fires and floods.
The storms and flash floods that hit Britain
last year and in 2000 are becoming an ever more common occurrence
as climate change takes hold. The Department for Environment, Food
and Rural Affairs (DEFRA) estimates that 10 per cent of the land
area of the UK, encompassing some 185,000
"Assess what applications matter
to your business and the impact of those going down" |
businesses, is at risk of flooding. And you needn’t
breathe a sigh of relief if you’re outside the designated
danger spots. All it takes is a spell of freezing weather (or a
clumsy road worker) to burst a water main outside your offices
and you could suffer a similarly destructive deluge.
The sheer diversity of potential problems is daunting. Critical
systems could be put out of action by anything from the introduction
of inadequately tested new technologies causing a catastrophic
chain reaction in your infrastructure, through natural disasters,
all the way up to rats chewing through a key communications cable – the
list is endless.
Power outages too, are becoming increasingly
common as the ageing National Grid struggles to keep up with growing
demand. After 18 years of pretty much uninterrupted electricity
service at its London headquarters, last year Standard Bank suffered
three power cuts, including one which took out key ICT systems
for four hours. And financial companies in Europe were left without
key market price data for 10 hours in October last year following
a power outage at Reuters’ Docklands datacentre, leaving
the company with tricky questions to answer from its customers,
analysts and shareholders about why it didn’t have adequate
business continuity plans in place.
Then there are electronic security threats, which continue to grow
exponentially. Worms, viruses and trojans are becoming easier for
hackers to create because of the profusion of automated hacking
tools and information available on the Internet. And even if you
have good security measures in place, you are not guaranteed safe.
The time between vulnerabilities in software being identified and
exploits appearing is shrinking, so often patches and updates are
not produced in time for companies to prevent exposure to these
threats.
For most organisations, the cost of a single disaster of whatever
nature could be catastrophic. Not only could you lose valuable
records and the ability to conduct your business for a period of
time, but you could also lose your reputation – and in some
cases might face the added problem of legal worries. Corporate
governance regulations across the world, such as Sarbanes-Oxley,
are increasingly requiring that companies have adequate data protection
and backup to ensure key financial and audit data is not lost – with
potentially hefty fines, and even closure, for those organisations
found lacking.
Solid plan
| Expert Advice |
Effectively, Business Continuity
Planning is a form of insurance and – just like the
insurance market with its complex array of policies, contracts
and clauses – it can be just as confusing to the uninitiated.
While most good BCP strategies involve external partners
at some stage, many organisations find it extremely helpful
to enlist expert assistance as early as the planning stages.
Such a move means you can use the experience and know-how
of an ICT service partner to help you consolidate your systems
before you implement BCP. Such a partner can also help you
to define the appropriate scope of the strategy and assess
the right balance of risks and benefits.
There is absolutely no point in paying for more protection
that you really need, but equally a penny-pinching plan is
a false economy if it won’t work in practice. |
So what can be done both to avoid your business falling over and
to bring it back up and running as soon as possible if it does?
The first stage is to develop a solid plan, and the first stage
of that plan should be to allocate clear roles and responsibilities.
You have to know who’s in charge of your business continuity
strategy – lack of accountability is the biggest precursor
to disaster. You should also make sure that the plan has an executive
sponsor, preferably a senior board director. If smooth-running
systems are vital to your business – and in nearly all cases
they will be – then that importance should be reflected by
the personnel who are responsible for it. Don’t put it all
on the shoulders of a handful of techies.
As Computacenter’s Simon Gay points out: “It’s
a sad fact that too often business continuity planning doesn’t
actually involve the business. ICT people are left to dwell on
this on their own in the dark. They’re doing the best they
can, but they’re not sure precisely what they’re meant
to provide for.”
Both technical and non-technical managers need to be involved from
the outset. Once you have set up your business continuity board,
you need to identify precisely what the scope of the plan will
be. “It is still unnervingly common to find people are not
defining precisely what systems need recovering in the event of
a disaster. You need to clearly assess what applications matter
to your business – and what the financial impact of those
applications going down would be,” says Gay.
Prioritise which systems need protecting and identify which are
most vulnerable. The planning phase should consider the risks involved
and weigh them up against the potential costs of various means
of prevention and recovery. Business continuity procedures can
encompass anything from having a regular off-site backup of your
key data to replicating your entire ICT infrastructure at a wholly-owned
remote site and mirroring all your data and transactions in real
time.
Technical measures
There are many technical measures to consider, which are
outlined in more detail in the feature on page 14. Some form of
off-site backup is obviously essential – and not just stored
at the company down the road. You need to place a fair distance
between your site and the location of your backup in case of a
disaster that affects a whole locality, such as a flash flood or
regional power cut. Other options to consider are the frequency
and level of data backup you need – daily, hourly or real-time.
The latter, for instance, may be vital for a bank carrying out
millions of pounds’ worth of transactions every day, but
probably not necessary if you are a specialist manufacturing company.
“These things are all about probabilities,” says Gay. “For
instance, there are some organisations that always insist on maintaining three
copies of data at different locations across the globe. Typically, those would
be major financial organisations where there’s potentially substantial
amounts of money at stake in the event of any downtime. If you cannot afford
to lose any data and you’re prepared to spend any sum of money then go
for a third copy. Frankly, though, a second copy is deemed by most people to
be good enough, as long as it’s kept a fair distance from the primary location.”
Do you need backup power generators?
If you are operating in an area prone to outages, then probably
yes. Or expert systems to predict problems in advance of them
occurring? Again, it depends how much you might lose in the event
of a failure,
and what additional burden such systems put on your infrastructure
and operational costs. Do you need a dedicated recovery site
or will it suffice to be able to take space in a third-party
shared centre if any disaster arises? For all but the largest
organisations, the latter is probably sufficient. But again,
the answer to all these questions depends on assessing the risks
to your business and finding the right balance between cost and
peace of mind. It is certainly worth talking the issues through
with experts before you make any firm decisions.
Of course, it isn’t only your ICT systems that you should
be thinking about. Effective processes and procedures are equally
vital. As well as the technical components of the plan, you need
to have effective measures for notification and escalation in the
event of any disasters, procedures for recovery and restoration,
processes for selecting and managing appropriate third-party suppliers,
as well as for communicating procedures, roles and responsibilities
to staff.
You also need to define the metrics you will use to test, measure
and continually assess that your plans are working and remain current.
As mentioned earlier, it’s also vital to consider the regulatory
and compliance issues that apply to your business and sector, such
as Sarbanes-Oxley. These will vary according to the business sectors
and parts of the globe in which your organisation operates, but
increasingly they are coming to mean that setting out clear and
detailed policies and procedures (and implementing appropriate
testing and metrics) is not simply a matter of best practice – it’s
a legal requirement.
Effective processes
There are other seemingly small but vital considerations.
In the event of an electricity failure, for example, you need to
make sure you aren’t literally left in the dark. Gay recounts
a cautionary tale from one of Computacenter’s customers: “They
had millions of pounds’ worth of disaster-recovery contracts
in place, but the one thing that scuppered them was that they’d
forgotten to spend a few pounds on a torch. They didn’t think
about what it was going to be like scrabbling round in the pitch
black of the computer room in the bowels of their building. You’ve
got to consider the reality of disastrous events.”
Only once all these elements are in place can you move on to the
testing and implementation phase of your plan. A vital part of
this is making sure that everyone knows precisely what their roles
are in the event of any failure or disaster. Consider not only
the role of operational staff, but also your customer-facing employees.
If downtime occurs, you will want to ensure your reputation is
not affected, which means making certain client managers and sales
staff know the right procedures for handling customers and either
shielding them or assuaging any worries they may have.
Realistic testing
Testing also needs to be realistic. Here, you would be
well advised to follow a formal methodology such as the ITIL (IT
Infrastructure Library) process for IT service management, or the
British Standards Institute’s new PAS 56 standard for best
practice in business continuity. Sadly, as Gay points out, testing
is often little more than a back-slapping exercise to bolster a
flawed procedure. “Many companies’ business continuity
and disaster recovery tests are carried out once a year and planned
well in advance.
"It’s very hard to visualise
what a real disaster would be like. You need all the right
people who know the systems" |
Everyone’s standing in the right place with the right folder,
all knowing precisely what they’re going to do. Well, I’m
afraid disasters don’t occur like that. It’s actually
very hard to visualise what a real disaster would be like,” he
says.
“For example, if people think they can recover from tape, they are being
highly optimistic. You need the all the right people there, who know the systems
back to front. That’s assuming you’ve got a good backup in the first
place, which too often companies haven’t.”
Testing should not be a one-off or an annual event. It needs to
be continual. You must build procedures into your business continuity
plan that ensure any changes to systems or business processes automatically
consider the impact on BCP. Otherwise, your supposed protection
could be worth nought. Gay sites a common example of a customer
in the insurance sector that had changed its backup process shortly
after an annual test, but had neglected to update the business
continuity strategy.
“They had two disaster recovery sites for two different sorts of systems,
and yet they had amalgamated their backup into one process without taking this
into account. It wasn’t until the next annual test that it was spotted.
For an entire year that organisation was flying blind. That clearly shows why
an annual test just doesn’t work. If a disaster had occurred, which fortunately
it didn’t, that customer could not have recovered, and consequences would
have been dire for the organisation,” he says.
But do it right and business continuity planning can ensure that
any disaster not only has a minimal impact, but also that your
organisation drastically reduces the likelihood that it will suffer
any downtime in the first place. And while BCP might not be able
to turn back the floodwaters or fend off the alien invasion, it
can certainly ensure that you’ll sleep far more soundly at
night. |