Appendix G: Calculations of the likely range of random year-to-year variation in road collision and casualty numbers

Introduction

This Appendix describes the methods that were used to calculate the likely range of random year-to-year variation in road collision and casualty numbers for Scotland as a whole that are shown in Figures 2, 3, 4 and 5. Two different methods were used: a simple method for Figures 2, 3 and 5, and a more complex method for Figure 4.

Calculating the likely ranges of values for Figures 2, 3 and 5

In the case of Figures 2, 3 and 5, the likely ranges of values were calculated on the assumption that the numbers are the outcome of a Poisson process. This is a process in which events occur at random, with the probability of an event occurring depending upon the underlying rate of their occurrence (not upon how long it has been since a previous event, nor upon the number of events that have occurred in a recent period). For the purpose of producing these charts, it was assumed that the underlying rate of occurrence in each year is the same as the value of the 5-year moving average centred on that year. (That is why there are no grey dashed lines for the last two years: one cannot calculate a 5-year moving average centred on 2020 until one has the values for 2021 and 2022).

A characteristic of a Poisson distribution is that the mean and the (statistical) variance are the same. Because the numbers are all much larger than 100, the assumption of asymptotic normality applies, and one would expect only about 5% of cases to fall outwith a 95% confidence interval range of plus or minus two standard deviations. Therefore, the upper and lower limits shown on the chart were calculated simply as the moving average plus and minus twice the standard deviation (for smaller numbers, exact ranges could have been calculated using the inverse Chi-square distribution).In the case of Figures 2, 3 and 5, the standard deviation was taken to be the square root of the assumed variance (i.e. the square root of the assumed underlying rate, and therefore the square root of the moving average).

In terms of statistical theory, this approach is appropriate for the number of fatal collisions (shown in Figure 2). However, it is a simplification in the case of the numbers of casualties of various types (shown in Figures 3, 4 and 5), because they have two random elements: the occurrence of an collision, and the number of casualties in it. The numbers of casualties would therefore be expected to have a greater range of statistical variability than that resulting from a simple Poisson process. However, as it happens, the simple approach appears to suffice for Figures 3 and 5 (probably because the numbers involved are relatively small, and therefore, as discussed in Section 1.4 of the Commentary, the calculated ranges are quite wide in percentage terms) – but the larger numbers in Figure 4 require a more complex method of calculation of the likely range of values.

Calculating the likely range of values for Figure 4

An initial version of Figure 4 was produced using the approach described above – i.e. the numbers of casualties were assumed to be the result of a Poisson process whose underlying rate for each year was the moving average for that year. The standard deviation was simply calculated from the square root of the moving average, and the ranges were simply +/- twice this standard deviation. However, the initial version of the chart showed that this approach under-estimated greatly the variability of the figures, as over half the years (53%) had values which were outwith the calculated ranges.

It was noted earlier that the variation in the number of casualties is likely to be greater than that which would result from a simple Poisson process. A method to deal with this extra-Poisson variation is discussed in a paper by Washington State Department of Health, Guidelines for using Confidence Intervals for Public Health Assessment.

The paper discussed the statistical problem of multiple admissions. For example, an asthma patient may be admitted many times, so that multiple admissions for an individual person are not likely to be independent of each other. A person who is hospitalised once for asthma is more likely to be hospitalised for asthma again than someone who has never been hospitalised for asthma. Therefore, the total count of admissions may not follow a Poisson distribution, and it is typical for the total count in such a situation to exhibit greater variability than would be expected from a Poisson process. As a result, simple methods of estimation (like those used to produce Figures 2, 3 and 5) will produce intervals which are too narrow.

The method proposed for calculating the variance in such a case is set out at section 4.6.2 of the Washington State Department of Health paper.

There is a clear analogy here with the road casualty figures. In our terms:

d is the number of killed and seriously injured casualties;
dj is the number of killed and seriously injured casualties for collision j;and
P is the total number of injury collisions (including slight collisions)

We want to calculate the variance of d.

Because R = d / P it follows that d = R * P and the variance of d can be calculated from the variance of R.

The calculation of the variance of R requires one to sum the squares of the d_js – i.e. the squares of the numbers of people who were killed or seriously injured in each injury collision. These numbers were extracted from the Transport Scotland's computer database, which holds details of individual injury collisions back to 1979. For example, in 1979 there were 23,064 injury collisions. 14,800 of these had only slight casualties, 7,077 had one KSI casualty, 843 had two KSI casualties, 195 had three KSI casualties, and so on. The sum of the squares of the d_js is then simply (7,077 * 1²) + (843 * 2²) + (195 * 3²) + and so on. The variance of R can therefore be calculated for each year for 1979 onwards. Because figures for the numbers of casualties in each injury collision are not available for earlier years, it is not possible to calculate variances on this basis for years before 1979.

There is an added complication in our case as the total number of injury collisions (our P), which was assumed to be the result of a Poisson process, is also subject to random year-to-year variation, and therefore also has a variance associated with it. The standard deviation here can be calculated in the simple way, just the square root of the moving average value.

Then, because d = R * P, the variance of d is calculated as the variance of R plus the variance of P. (There is no covariance between the d_j and the P_j, because the value of P_j is equal to one for every value of d_j, since each P_j is a single injury collision).

The likely ranges of values are then calculated in the usual way, with the interval being +/- twice the standard deviation.

Figure 4 was prepared on this basis. This method appears to produce more realistic measures of the variability of the number of KSI casualties, but there are many years' figures (around a third) outwith the calculated ranges. The likely reason for this is that statistical variability is not the only reason for year-to-year changes – other factors have contributed to sharp falls and rises in KSI casualty numbers, as discussed in the publication Commentary. As the Commentary mentioned, in effect, such factors change the Poisson process's underlying rate of occurrence of collisions and/or casualties, and therefore, in effect, introduce a break into the series of moving average values. The method used to calculate the likely range of random year-to-year variation cannot take account of the effect of such changes.

< Previous | Contents | Next >