Professor Sidney Dekker looks at lessons of organisational resilience

Our understanding of accidents has changed dramatically over the past decades, and this change in comprehension could be important for insurers. There was a time when accidents were seen as the product of mere chance, as an unfortunate coincidence of space and time. Accidents were regarded as both unpredictable and unmanageable. They were essentially meaningless, and only of marginal societal or scientific interest.

But high visibility, high-stake accidents in the 1970s and onward, have turned both experts and the public away from this vision of misfortune as meaningless, unforeseeable happenstance. Accidents such as the Three Mile Island nuclear mishap and the Space Shuttle Challenger explosion in the US, or the Piper Alpha oil platform fire off the Scottish coast, have sponsored a different understanding. Accidents and risk have recaptured centre stage. And to the late modern mind, beneath them lies not a stochastic cesspool that defies foretelling, but clear causes that were not acted upon by those responsible. We now see accidents as the result of risk that was not managed well. The phrase 'man-made disasters', coined by Barry Turner in the late 1970s, marked this conversion from the accident as act of God - out of reach, uncontrollable - into something made by man, and thus within reach, controllable and manageable.

Such an interpretation, of course, builds on a growing societal recognition that many technologically advanced endeavours are risky to begin with.

For example, few people need reminding of the basic risks (whether real or imagined) associated with nuclear power generation. Such a growing recognition of risk, however, has not been accompanied by greater societal acceptance of that risk. On the contrary. Zero-tolerance of accidents in, for example, air transportation has been translated into political mandates in a number of countries (so far without tractable result).

If accidents today are seen as the result of risk that was not managed well, then this carries enormous implications for where demands for accountability end up. The late modern mind would like to believe that accidents happen because operators do stupid things or because managers and company boards make immoral trade-offs (between production and safety, for example).

Superficially, there is often a lot of support for these ideas. When we sort through the rubble of an accident, these decisions and errors strike us as egregious, as shocking, as deviant, or even as criminal. If only these people had done their jobs! If only they had done what we pay them to do! Then the accident would never have happened. There seems only one way to go after such discoveries: fire the people who did not do their jobs, perhaps even prosecute them and put them in jail. Make the company hurt so badly financially (punitive damages) that they never touch or wreck a safety-critical system again. In fact, set an example by punishing them.

The problem is that there is no evidence that this strategy leads to greater safety. First, accidents don't just happen because a few people do stupid things or make immoral trade-offs. Second, punishing people or companies does not create progress on safety - it does not prevent such accidents from happening again.

Balanced explanations

Here is why. The balance of scientific opinion after the large man-made disasters of the past three decades says that accidents are almost normal, to-be-expected by-products of systems that operate under resource scarcity and competition; that accidents are the normal side-effect of normal people doing normal work in everyday organisations that operate technology that is exposed to a certain amount of risk. Accidents, when they do happen, happen because entire systems fail, not because people fail. This is called the systems view. The systems view sees the operator errors and managerial trade-offs that we discover on the surface as symptoms, not as causes.

These things do not 'cause' an accident. Rather, they are symptoms of issues that lie much deeper inside a system and the society that buys its goods or services. These issues may have to do with priorities, politics, organisational communication, engineering uncertainties, and much more.

To people who work in these organisations, however, at all levels, such issues are seldom as obvious as they are to outside observers after an accident. To people inside organisations, these issues are not noteworthy or special. They are the stuff of doing everyday work in everyday organisations.

Think of it: there is no organisation where resource scarcity and communication problems do not play some sort of role (just think of your own workplace).

But connecting these issues to an accident, or the potential of an accident, before the accident happens, is impossible. Research shows that it is basically outside our ability to imagine accidents as possible. We don't believe that it is possible that an accident will happen. And what we don't believe, we cannot predict.

An additional problem is that the potential for having an accident can grow over time. Systems can slowly, and unnoticeably, move towards the edge of their safety envelopes. In their daily work, people - operators, managers, administrators - make numerous decisions and trade-offs. They solve numerous larger and small problems. This is part and parcel of their everyday work, their everyday lives. With each solved problem comes the confidence that they must be doing the right thing; a decision was made without obvious safety consequences. But other ramifications or consequences of those decisions may be hard to foresee, they may be impossible to predict.

The cumulative effect is called drift; the drift into failure. Drifting into failure is possible because people in organisations make thousands of small and large decisions that to them are seemingly unconnected. But together, eventually, all these little, normal decisions and actions can push a system over the edge. Research shows that recognising drift is incredibly difficult, if not impossible, either from the inside or the outside of the organisation.

Punishing people or organisations in the wake of an accident has yet to prove a fruitful way forward. In fact, there is plenty of evidence to show that it is not. Punishment, financial or otherwise, leads to defensive posturing; to people, departments, organisations ducking the debris; to them investing in an avoidance of 'accountability' rather than embracing it. Defensive manoeuvring can be detrimental to progress on safety because it stifles the flow of safety-related information. It leads to attitude polarisation and an unwillingness or inability to learn from failure.

Threatening operators or companies using cover stories such as "they should be disciplined in carrying out their work" or "they should have an ethical awareness" is equally counterproductive. It misses the point entirely on what makes systems dangerous or safe. What makes systems safe is an awareness of the safety threats (this requires people to feel free to talk about them, and their companies to share them industry-wide) that are out there. What makes systems safe is an awareness of where the boundaries of safe performance lie (this also requires people to feel free to talk about errors and problems they encounter) and how close the operation is to those boundaries. What makes systems safe is the realisation that it is the entire system that succeeds, and the entire system that fails. Not individual heroes or antiheroes. All open systems are continually adrift inside their safety envelopes.

Pressures of scarcity and competition, the intransparency and size of complex systems, the patterns of information that surround decision makers, and the incrementalist nature of their decisions over time, can cause systems to drift into failure. Drift is generated by normal processes of reconciling differential pressures on an organisation (efficiency, capacity utilisation, safety) while working with uncertain technology and imperfect knowledge. The very processes that normally guarantee safety and generate organisational success, can also be responsible for organisational demise. The same complex, intertwined sociotechnical life that surrounds the operation of successful technology, is to a large extent responsible for its potential failure. Researchers have pointed out how the role of these invisible and unacknowledged forces can be difficult to identify and disentangle. Harmful consequences can occur in organisations constructed to prevent them. Harmful consequences can occur even when everybody follows the rules.

If we take the interpretation of accidents beyond the current, late modern one, we would see that accidents are no longer about breakdowns or malfunctioning of components or people. They are about an organisation not adapting effectively to cope with the complexity and dynamics of its own structure and environment.

Organisational safety in the post-modern sense, is not a property, it is a capability: to recognise the boundaries of safe operations, to steer back from them in a controlled manner, to recover from a loss of control if it does occur. This means we must find new ways of engineering resilience into organisations, of equipping organisations with a capability to recognise, and recover from, a loss of control.

Safety investment

How can an organisation monitor its own adaptations (and how these bound the rationality of decision makers) to pressures of scarcity and competition, while dealing with imperfect knowledge and uncertain technology? How can an organisation become aware, and remain aware, of its models of risk and danger? Organisational resilience is about finding means to invest in safety even under pressures of scarcity and competition, because that may be when such investments are needed most. Preventing drift into failure, or managing the risk of man-made disasters, requires a new kind of organisational monitoring and learning. It means fixing on higher order variables, adding a new level of intelligence and analysis to the incident reporting and error counting that often passes for safety management today.

- Sidney Dekker is Professor of Human Factors at the Linkoping Institute of Technology in Sweden. He is author of The Field Guide to Human Error Investigations (Ashgate, 2002) and the forthcoming Ten Questions about Human Error: A New View of Human Factors and System Safety(Erlbaum, 2004).