Ready
for the Worst
Turning a Business Continuity Planning strategy into practical
action needs the right skills and capabilities
Effective Business Continuity Planning (BCP) and
Disaster Recovery (DR) are all about making sure your company’s
operations continue to run smoothly whether the challenge you
face is a temporary glitch or a major catastrophe. ICT systems
are an essential element in any BCP or DR effort, and ensuring
that your strategies can be converted into practical action when
the time comes means, first, that you need to be clear what you’re
planning for.
“The terms BCP and DR are often confused,” points out Computacenter’s
Trevor Hall. “The former is about trying to prevent a disaster happening
in the first place, using techniques such as risk analysis, business impact analysis,
testing, data replication, clustering and so on, while DR is about helping customers
to recover once something has actually gone wrong.”
Hall explains that Computacenter is involved in
both areas. “On the one hand, we have the consultancy and
technical expertise to assist customers in making their systems
as resilient and robust as they can be by implementing preventative
measures such as the right processes and technologies. And on
the other, we can help customers who have faced disaster to get
hold of the technology and facilities they need to get back up
and running quickly and smoothly,” he says.
Hall is personally responsible for the DR area, which we’ll
come to later. Ideally, though, customers won’t need to use
DR services – and the company’s consultancy practice
leader, Simon Gay, is one of those doing his best to make sure
they won’t. A key consideration here is to make sure you
have more than one copy of your critical business information. “Replicating
your data is utterly critical. You can buy shrink-wrapped software
and hardware components from anywhere. What you can’t buy
is your data. It’s unique – and unless you’ve
got it stored in two different places, you can never feel secure.
Even if you have tape backups, you can never be sure that they’re
going to work. You’ll be completely unsure of the outcome
for 24 hours while you try to reload from tapes that might be unreadable,
broken or contain the wrong version of data,” he cautions.
"Replicating your
data is utterly critical. You can buy software
and hardware anywhere. What you can’t
buy is your data. It’s unique" |
“Obviously the further apart
your copies of data are, the better. If they’re
in the same computer room and it floods, you
haven’t really achieved much. If it’s
a mile or two away, that’s great – but
if you get a gas leak, road closure or electricity
failure in your region then you’re still
in trouble. If you replicate data 30 miles away
or more, that’s a lot smarter.”
Computacenter can help by offering a service that allows customers
to back up their data remotely via a network connection into the
company’s remote datacentre, explains Gay. “Our datacentre
can receive a second copy of your data either synchronously, which
means as you write a transaction it’s mirrored immediately
in real time, or asynchronously, which means a few seconds or a
minutes behind,” he says.
The company has a remote hosting
centre in West London, and also has partnership
arrangements which enable it to offer alternative
locations for data hosting across the country. “There’s
pretty much no area of the UK we haven’t
got covered,” says Gay.
Data replication is only one of the ways an ICT service partner
can help prevent disasters. Another key area of prevention is testing
that your business continuity and disaster recovery plans actually
work as you expect them to. Gay notes that many organisations’ testing
is inadequate, and points out that some customers are shocked when
the professionals do some proper, rigorous testing of their plans.
He cites the example of one large financial services customer. “First
National Bank makes its living from being able to provide credit
instantly. If its database is down, it’ll miss vital opportunities
to lend money to potential customers. Before we’d tested
the company’s recovery plans, it believed it could come back
in four hours in the event of any failure. But when we did some
testing, we showed that the best it could hope for was four days – and
that was with a fair bit of manual work and the wind behind it.
At that point the company realised it needed us to come back in,
re-engineer its BCP and then prove it could genuinely recover in
four hours. So that’s exactly what we did,” says Gay.
Testing facilities
Although it can conduct in-house testing
for customers, much of Computacenter’s BCP
planning and testing for clients is carried out
at its state-of-the-art Solutions Centre in Hatfield,
which offers a variety of
"All this muck flowed
in through the back door and buried the server.
We managed to get a replacement in as guaranteed" |
services that enable customers to test their plans
without interruption to their live services (see box).
Gay notes that another way the company can help customers make
sure their BCP strategy works is by following the best-practice
guidelines set out by the ITIL standard for IT service management. “This
relates to the need for good processes, good procedures and good
governance – and, in the event of a disaster, being able
to demonstrate that you have all three in place. If you say that
you’ve tested your systems and can recover in two hours,
two days (or however long) in the event of a disaster, ITIL allows
you to prove it,” he says.
| Putting it to the
test |
Computacenter’s
Solutions Centre offers the facilities to put
your business continuity strategies to the test
The Solutions Centre is Computacenter’s
state-of-the-art testing facility, based in
Hatfield. It offers customers the ability to
test systems 24 hours a day, seven days a week
without having to disrupt their own operations.
The centre includes a large datacentre with
a whole raft of equipment from different manufacturers,
plus separate testing laboratories. And because
it sits next to the company’s goods warehouses,
it’s easy to ship in additional kit if
needed. Customers can also bring in their own
hardware if required.
In terms of business continuity planning,
the Solutions Centre gives customers invaluable
benefits, explainstechnology leader Zahid Din. “Organisations
need to test whether their BCP and DR plans
will work, but often they can’t afford
to do that in a live environment. Here, customers
can show that they’re hitting the recovery
times their business demands,” he says.
One of the customers to take advantage of
this facility was global insurance broker Willis. “They
brought all their tapes, media and documentation
into the Solutions Centre and we carried out
their DR test – which failed miserably,” says
Din. “Fortunately, because we had replicated
their environment in the centre, with all the
same kit, we were able to fix their strategy,
test it again and make sure it worked as needed.
Now, Willis comes back here on a regular basis
to check that it’s all still working.”
Another example is BT Openworld, which needed
to upgrade its web-facing database technology. “The
database serviced all customers logging in
to BT Openworld, so it was the very definition
of mission-critical,” says Din. “Obviously,
they couldn’t test the system in a live
environment because customers are logging on
to the system 24/7 and they simply can’t
afford to be without the service for long periods
of time. In fact, the company only allows one
hour of planned downtime every two weeks.”
The amount of work BT Openworld wanted to
do simply wouldn’t have been possible
in this small window, explains Din. “They
wanted to do three major upgrades as well as
implementing a third site for disaster recovery,
all the while maintaining business continuity.
They also wanted to test their availability
for clustering, and maintain data replication
throughout,” he says.
However, it wasn’t just a question
of testing. “There’s no way they
could have physically implemented all these
changes in the one hour of downtime they had
available,” says Din. Fortunately, the
Solutions Centre was able to help. “We
conducted extensive testing to prove how this
could be rolled out in a series of one-hour
chunks. It even reached the level where we
were timing things with a stop-watch!”
As well as proof-of-concept testing, the
Solutions Centre also offers load-testing and
hot-staging services. From a BCP point of view,
load testing prevents downtime by ensuring
customers’ systems have the performance
levels and capacity needed to handle various
loads, which the centre’s systems can
generate artificially This is useful, for example,
in situations where customers are about to
acquire another company and want to test that
their infrastructure will be able to handle
the additional load.
Hot staging, meanwhile, allows customers
to test new kit just as it would be running
in its own datacentre, together with applications,
but without affecting the organisation’s
live environment. Then the company can roll
out the new systems knowing everything is going
to work fine. “One of our retail customers
literally locks the doors of its datacentre
over the four-month Christmas and January sales
period since data is so crucial at that time,” says
Din. “However, they had a project that
would have been delayed by four months because
of this. Yet because we were able to hot-stage
for them, the project continued unhindered.
Once the datacentre was unlocked again in February
we were able to install the systems, do some
final testing and they were away. Without this
facility, either the project would have been
seriously delayed or they would have rushed
it, which could have resulted in serious downtime.”
. |
He explains that ITIL puts in place a rigourous cycle
of configuration management, measurement and testing
which is repeatedly reworked. “Saying, ‘Don’t
worry – we can recover from disasters’ just
doesn’t cut it any more. Under Sarbanes-Oxley
regulations, directors now have to sign legally-binding
forms that state their data is being well-protected
and well-managed. As a result, they are increasingly
demanding to see clearly what processes are in place,
so that if they do end up in court for some reason,
at least they’ll have the evidence to show precisely
what their procedures are, the results of their testing,
the steps they go through to reconfigure their environment
and the results of re-testing to prove that BCP still
works after any amendments have been made to the system.
The continual measurement and analysis demanded by
ITIL is key to this whole cycle, and we are able to
show customers precisely what to do and how to do it,” says
Gay.
Not only can effective testing and planning assuage any worries
about compliance, but it can also save you money. In their paranoia
to make sure they are covered against all eventualities, many companies
spend far more on disaster recovery than they actually need to. “Once
we have done proof of concept testing for customers, they often
find they can reduce their level of cover. With data replication
for example, one financial customer was able to remove some very
expensive contracts they had in place with third-party disaster
recovery hosting services,” says Gay.
Unavoidable disasters
But even with all the BCP in the world, there
are some disasters you just can’t avoid. Take
the situation that faced Computacenter customer B&Q.
The company has 396 DIY stores across the country,
and like all large retailers these days it relies on
technology for stock-ordering, tills and so on. Each
store has a couple of servers and because it’s
a retail environment the company can’t afford
to be out of action for very long. “We have a
contract with the company that guarantees we will ship
a replacement server anywhere in the UK within eight
hours and get it back online,” Hall explains.
He adds: “When we had a lot of rain the other year, there
was a huge mudslide in Wales at the bottom of the valleys – right
where B&Q just happened to have a store. All this muck flowed
in through the back door and buried the server. We managed to get
a replacement in as guaranteed, but as you can imagine it was a
pretty messy business!”
Retail and manufacturing customers form an important part of Computacenter’s
DR business. While the company doesn’t get involved in the
area of offering alternative premises in the event of a major disaster,
Hall’s team at Computacenter has a large inventory of PCs,
laptops and servers that it can draw on to help customers out in
the event of them needing kit at short notice. “Many organisations
aren’t big enough to warrant subscribing to an expensive
third-party recovery centre, or the geography of their sites doesn’t
lend itself to this approach. In areas such as manufacturing and
retail it’s impossible to move the factory or shop anyway,” he
says.
Another area of DR where Computacenter has a major
presence is with large financial institutions that
often have their own disaster recovery sites, but which
don’t want the expense of keeping these full
of up-to-date IT equipment that most likely will never
be used. “They come to us in the knowledge that
at any one time we’ve got a wide range of up-to-date
equipment that we can deliver to their sites in short
order. Our service can protect them from the investment
overhead of having systems on every desk and the redundancy
of these systems as technology continues to change.
A company could subscribe to our service for between
five and seven years before the service cost would
equal the cost of purchasing the hardware today – and
in that time the technology would have changed four
times,” Hall says.
Most of the calls Hall takes from customers are down to fairly
mundane disasters such as hardware failures and theft – although
he notes floods are certainly becoming more common. “We didn’t
have any clients in Boscastle, which was submerged this year, but
the other year when nearly every river in Yorkshire was on flood
alert, we were put on standby by a number of our customers in Leeds,
Bradford and Sheffield. The river was lapping at their doorsteps
and they were on the phone to us at the same time as they were
dumping sandbags,” he recalls.
But it’s worth remembering that although the
biggest incidents are few and far between, they can
be potentially catastrophic to your business. Hall
says: “The biggest disaster we got involved in
was the IRA bombing of a Manchester shopping centre
in 1996. One of our clients wasn’t badly affected
by the bomb itself, but it was located on one of the
access streets, which was closed for weeks, and our
client wasn’t allowed into its offices. We had
to ship 300 PCs and a bunch of servers into a nearby
recovery site, which arguably saved its business from
collapse.”
That illustrates that disaster doesn’t even
have to strike you – a near miss is enough to
cause disruption to the business. But with the right
planning and the right partner to support you, you
can be ready for anything. |