Organisations need to ensure survivability when assessing their systems' risk exposures.
Reliance on technology systems and networks, by business and society in general, increases inexorably. Air traffic control and national power distribution services would not function, for example, without their IT systems and networks. Financial services and the banking system would grind to a halt almost immediately without theirs - though some may consider that no bad thing in current market conditions.
Despite the bursting of the dot.com bubble and the collapse of technology stocks, dependence on critical systems and networks grows: increasingly inexpensive and powerful computer hardware, software and networks continue to provide opportunities for more and better function, for innovation, for cost reduction and business advantage.
The financial services industry is today heavily reliant on a complex mesh of networked information systems. Manufacturing and construction rely completely on internal automation and external supply chain management systems. Compared to these examples, the insurance industry demonstrates a notable lack of reliance on critical systems without which the acquisition and servicing of business would grind to a halt.
The insurance industry is not, however, immune from the forces that drive increasing dependence on systems and networks. Not being a pioneer in this respect has its rewards in that lessons can be learnt from those who have blazed the trail; their mistakes need not be repeated, and the economies and capabilities of the latest technologies can be exploited. One lesson is that the more critical systems and networks are to the functioning of the business, the more important it is that they stay alive. An internet bank, for example, is nothing except data systems and networks, and is closed for business if its web pages are not alive.
The events of September 11 caused society to stop and think about many aspects of modern life. One lesson for business was, surely, that it is essential to ensure that critical resources are protected as far as possible so they have the maximum chance of surviving an attack or a disaster. The most important resource a business has is people, and appropriate technology and procedures have an important role to play here. The design and fabric of buildings should be able to withstand fire, flood and attack at least long enough to allow occupants to escape. There must be adequate means of escape within the time that the building remains viable and there must be procedures that are tested to ensure they work and that everyone knows what to do. So much would seem obvious - protect and survive. And yet too often businesses do not look at their critical data, systems and networks in this way.
Survivability is a cumbersome word but perhaps a useful one. In the event of an attack, a disaster or just a critical failure, a business needs its critical resources to survive and for normal service to be resumed as soon as possible - ideally before the business goes bust. The more critical technology services are to the business, the shorter the time the business can afford them to be off the air.
For society, non-survivability of critical systems is potentially life threatening. In the US, the Presidential Commission on Critical Infrastructure Protection has already issued recommendations following September 11, offering useful guidance pertinent to business.
Continuity and recovery
While survivability is a useful concept and one that can capture the imagination of a board, disaster recovery planning (DRP) and business continuity planning (BCP) sound boring. In organisations where systems and networks are not intrinsic to the profitability of the business, or, as in the case of the internet bank, are the business, DRP and BCP may be an afterthought or, worse, a special project divorced from the business mainstream. This is unnecessary and unprofitable, and it is timely to take a fresh look at how businesses can protect critical data, systems and networks. We know that our dependence upon technology services will grow, not diminish. Taking action now will reduce future business risk.
So how to get started? One place to start is with a motto - KIS(S): keep it simple (stupid). The stupid is optional. There is more common sense than rocket science in survival planning; if it all gets more complicated than your mother can understand, you've probably got it wrong.
Then, what about a guiding principle? Electronic records survive far, far more readily than paper records. Digital information can be made very hardy. But what about the backup tapes from last year that couldn't be read when they were needed, and what about the disk that crashed, losing all the data on it? Clearly, digital data does not have a will of its own to survive. Indeed, if anything the opposite seems to be true, so people must ensure that it does survive. This is most easily done when digital data is at the heart of the business.
Why bother? Here's why. Two autonomous divisions of the same company occupied offices in the World Trade Center. Division A used document management and collaboration systems to drive their internal business processes and their interactions with clients, suppliers and business partners. Of course, there was still lots of paper about but crucially the primary business record was the digital one not the paper copy. Recognising that their data, systems and networks were crucial to their business, they had planned to cope with the loss of their primary data copies and critical systems. Offsite copies of their data were taken daily, and a resilient enterprise network and systems reinstatement facility was in place.
Division B of the same company had the usual portfolio of office, mail and back office support systems. In acquiring and servicing business however, paper ruled.
The result after September 11 was that Division A was soon up and running with all records and systems intact whereas, a year later, Division B is still rebuilding its business. Clearly no one in Division A had anticipated the disaster that befell them and much recovery planning was required on the fly. But core data, systems and network assets survived and the business restarted successfully.
Division B lost their business records and, tragically, many of their people who understood the business. Division B's business had to be rebuilt from the bottom up. Subsequent investigations by the company found that the cost of doing business in Division B had been higher than in Division A, and unsurprisingly the practices of Division A are now being adopted company-wide.
So, we have a motto and a guiding principle. Of course one cannot just dump paper records and create digital equivalents overnight. Survival planning for data, however, encourages the business to look seriously at how critical documents are stored and secured, and at the cost and functional benefits to be gained from digital rather than paper records at the heart of the business.
Scoping the events that one needs to survive is the next step. Again there are plenty of resources out there to assist with this. "What is the event that we are trying to survive?" is the question here. Is it, for example, the loss of the main business premises? Is it the loss of the main computer system? Is it both? What other resources - human, environmental, systems, networks or whatever - are also assumed to be lost or still working? From this thinking comes a set of scenarios for which survivability of critical data, systems and networks needs to be planned. Which scenarios are appropriate for any business is a matter of business judgment and risk assessment. The more critical the data, systems and networks are to the business, the more comprehensive the scenarios need to be.
This may all start to sound somewhat daunting and possibly expensive. Done incorrectly, it certainly is both, and it can be both when done properly. For the internet bank, for example, the business continuity plan would be a major part of the business and substantial funds would be allocated to it. Without the data, systems and networks, there is no internet bank. For businesses not as dependent on critical systems, assuring survivability with paper-based systems is likely to be more complex and costly.
So, we've scoped our disasters and we're ready to go. Next, three inter-related activities need to take place.
The first is to dust-off or create an inventory of the systems and networks that the business uses, and to understand the role they and their data play in the business. For large organisations, in particular, this can be daunting. What is required is a scheme of the processes that constitute the business against which data, systems and networks can be mapped. A virtue of smaller organisations is usually that the people are more intimately connected to the core business; everyone knows what it does and why, there is usually someone around who knows how each bit works, and there aren't that many bits. This is hardly ever true in a larger, traditional organisation and getting an accurate map built and then acting on it can be challenging, unless a major incident captures the attention of the executive. But if an organisation does not have the business processes, data, systems and networks mapped, the map needs to be created before survivability planning can be carried out. Smaller organisations have the advantage here.
The good news is, however, that creating the map allows the business to rationalise what its systems and networks currently do for it. An interesting aspect of introducing survivability as an objective is that it can result in simpler rather than more complicated systems and network design. If the map has not previously been created, or has not been reviewed for some time, there are often a lot of, "What the heck is that for?" and "Why did we do it like that?" discussions that take place. Where the technology used by an organisation has not been refreshed for some time, the map review, coupled with the exploitation of new technology opportunities, can result in a simpler, more resilient, more powerful, more understandable and cheaper configuration.
Thus, the business is all set to implement its survival strategy. What happens now?
This is where technology services opportunities can be exploited, and where their applicability can be judged, not only against the usual ROI measures for normal business operations but now also against their ability to ensure the survivability of critical resources in the disaster scenarios.
The importance of digital rather than paper primary records has already been discussed. Aside from this, there are two main technology areas that offer excellent qualities for ensuring survivability. The first is internet protocol (IP) networking, particularly broadband IP and IP virtual private networks. These offer speed and capacity undreamed of even a few years ago, and the ability to build secure virtual private networks to replace private leased lines for enterprise networking. With the right supplier offering the right products, enterprise networks that provide high capacity internet access and private networking can now be built at a fraction of the cost of even a few years ago. If a comprehensive review of your business's enterprise networking has not been undertaken for a while, you are certainly paying more than you need to and are probably not exploiting the capabilities of current networking technology.
The second area is external co-location of services. This is useful in a number of respects. Having business data and servers located in a purpose-built facility designed specifically for survivability and away from the primary business premises is a good thing, from a survival viewpoint. Objectively measuring the real cost of housing and supporting data and servers in-house, including the costs of the building space occupied, external hosting of business services can often prove extremely economic. The external co-location facility should provide resilient and virtually unlimited network bandwidth available on a call-off basis. This can be far cheaper than providing equivalent bandwidth from company premises.
In London, for example, Lloyd's of London has a computer room in one of its London buildings and its own caged private suite in an external co-location facility. Why? The cost of the space is lower, the cost of bandwidth provision is less and the multiple locations allow for a resilient network to be implemented and for service and data configurations to be implemented to survive disaster scenarios. Lloyd's also worked with a service provider which was able to understand their business imperatives and to respond with appropriate networking and co-location services. For example, one unusual aspect of the implementation is the provision of gigabit fibre connections between Lloyd's and the co-location site, allowing the internal servers operating on Lloyd's intranets and LANs to be housed in the co-location suite.
This proves that the key to getting the technology piece of the survivability plan right is to work with service providers which really understand the benefits of the appropriate use of available technologies and are able to provide it economically.
There are several points to be taken into consideration when assessing business continuity requirements. These include:
By George Tipper
George Tipper is business development manager at BIS Ltd, a London-based provider of co-location services for Lloyd's and other players within the re/insurance community.