Jane Toothill and Michel Voronkoff explain the development of the DACH Flood model, and how it highlights the need for better quality exposure data in Europe

Historically, improvements in exposure data quality and the resolution of catastrophe models have progressed alongside each other. Technically speaking, the accuracy of results obtained from a hazard model can be no greater than the accuracy of the worst piece of data used in the model.

In addition, the greater the degree of resolution, the longer the computing power required, and the longer run times needed to obtain results.

In the early days of cat modeling, the portfolio data presented for analysis was often the weakest link. When such data are aggregated to Cresta zone, or even country level, there is little point compromising software performance by developing a model with an extremely detailed hazard component. Similarly, for the vulnerability module of catastrophe models, in the absence of any detailed structural data in the portfolio, a model can give equally acceptable results using generic assumptions relating to the building stock of a country or region.

The use of a relatively simplistic hazard model cannot be justified, however, once higher quality portfolio data become available for analysis.

Improving quality of insurance portfolio data in terms of both geographical resolution and structural information has led to steadily increasing demands on catastrophe models. Alongside this, advances in computing technology have meant that higher quality models can be developed without causing the run-time issues that would have resulted a few years ago. Hence, increasing data standards, together with increased knowledge and expectation amongst players in the re/insurance market, have led to steadily improving standards in commercial catastrophe models. Such standards are seen both in the degree of resolution and scientific quality of hazard modeling technology and in the range of vulnerability functions that are supplied.

Given the ready access to computing power available today, there is no reason, however, why the reverse trend cannot also occur. Technical and scientific advances in catastrophe modeling can now be implemented in commercial models in a practical and robust manner that does not impact the usability of such a model. Once the quality of the hazard model is greater than the quality of portfolio data provided for analysis, the onus is passed back to the insurance industry to provide better quality data and enable models to be used to their full potential.

One significant step forwards in catastrophe modeling during recent years has been the appearance and expansion of flood hazard models. Flood is a complex hazard to model, in part because of its localised nature, which demands that very accurate modeling is carried out in areas at risk. Outside of the insurance business, significant steps have been taken to enable accurate meteorological and hydrological modeling of precipitation patterns and river systems, and these advances mean that flood is one peril for which scientific standards begin to overtake the quality of data supplied by the market.

One recent flood modeling innovation in Europe is the development of the DACH Flood project, being carried out jointly by EQECAT and Guy Carpenter.

DACH Flood aims to provide a flood modeling solution for Europe, with the initial stages of development concentrating on the German speaking territories (Germany - D, Austria - A and Switzerland - CH).

For DACH Flood, as any other flood model, data collection and incorporation forms a critical and complex piece. This is partly because of the localised nature of flood events, which demands that very accurate data are used in areas susceptible to flooding. The potential effects of data quality are illustrated in Figure 1, which shows the flooded area for the same flood scenario generated using digital terrain models (DTMs) with vertical accuracies of 10m and 1m, respectively. The 10m DTM generates a much less detailed flood outline and implies a flooded area that is some 39% larger than the flooded area implied when the more detailed DTM is used instead.

Unfortunately, use of inadequate data has a strong tendency to result in overestimates of flood extents, and hence, when incorporated into a hazard model, in the loss calculated.

The wide range of data inputs required by a flood model is a second reason why data quality and collection is such a critical issue in flood modeling.

Information not only relating directly to water levels and frequencies in rivers, but also from such widely disparate sources as meteorological stations, geological maps, flood management authorities, building structure data and insurance information, to name but a few, is required. In some cases, good quality data is hard to find, but more often there are many different sources of information available, and sorting out the good from the bad can be as time-consuming as gathering the information in the first place. In addition, care must be taken that the various layers of information are compatible with each other and that the benefit of high quality data sets is not lost when combined with other sources of information either because of incompatibilities between the data sets or where an equivalent quality of data is simply not available.

The DACH Flood development has placed particular emphasis on obtaining good quality and compatible data sets, so that users of the software can be sure that the model is based on the best possible information available.

In addition, a modular software structure has been used so that as better quality data become available, these new sources of information can be inserted into the software structure with as little disruption as possible.

In terms of the data that must be analysed, there are two extremes (Figure 2). One is the flood modeler's 'perfect' data set, which is rarely (if ever) available in today's market, but which forms an ideal towards which insurers should work if they wish to obtain the highest possible quality of analytical results in the future. The other end of the spectrum is represented by aggregate data portfolios of the kind commonly presented for analysis today.

In an ideal data set, risks are located by street address or geographical co-ordinates (latitude/longitude), and in addition to the total sum insured (per storey of the building) and insurance terms and conditions, information relating to structure property such as height (number of storeys), construction material, presence/absence of a cellar and any special use of the ground floor would be known. In the case of commercial and industrial properties, occupancy per floor of the building would be described and any special flood protection measures would be known. Today, such datasets largely remain a dream in the eyes of those whose work it is to carry out flood analyses of insurance data portfolios using models whose technical capability far outstrips the quality of data presented for analysis. By way of an example, DACH Flood provides a 50m x 50m DTM on a 50m x 50m grid, is able to analyse risks located by latitude/longitude, and can accept information in a portfolio relating to property occupancy, height and construction type/material. Unfortunately, few companies on the market can currently use this functionality to obtain the best possible results.

A more realistic expectation for insurance data for which a flood analysis is required is for an aggregate portfolio containing little or no information regarding flood vulnerability. These data typically provide sum insured aggregated to postcode level (or worse), limits/deductibles, line of business, sometimes coverage, and more rarely, occupancy information. Structural data are extremely rare. Common additional problems relate to multi-site policies in which individual risks may or may not be located at the postcode specified and are assigned a sum insured that is the average of the total sum insured for the whole policy.

In order to cope with such data, DACH Flood uses a further component in addition to the usual hazard/vulnerability pieces. This is termed the 'built environment' or 'exposure' module and provides information that can act as a proxy for missing data in aggregate portfolios. For each line of business, the built environment module determines the most likely distribution of risk location and structural type per geographical area in which risks are located (e.g. postcode, Cresta code, etc). These locations are then cross-referenced with the DTM to determine the best possible estimate of distribution of value and building type with elevation in a postcode.

The use of a built environment model provides a practical solution to dealing with aggregate information and enables delivery of robust results that attain the highest level of accuracy possible given the standard of input information. It remains, however, a model, and as such cannot be expected to entirely compensate for lack of provision of detailed insurance data to begin with. DACH Flood is one modeling solution that provides the ability to analyse in a single platform a wide range of data standards.