Road Accident Data Collection Form Design Research Project

3 Literature review

Purpose and scope

The purpose of Task 1 was to review known literature to identify relevant knowledge in the field of form design and data quality which could inform a redesign of the STATS19 form, and the historical changes to the design and use of the STATS19 system. Literature was identified from the following sources:

  • documents specified in the Invitation to Tender
  • known references recommended by TRL STATS19 experts
  • references provided by a Local Authority representative during engagement with stakeholders
  • relevant references identified through a literature search by the TRL information centre

Background on the STATS19 form and previous revisions

The term STATS19 was first introduced in 1979 following the development of a new injury collision reporting system in the late 1970s.

The STATS20 documentation (Department for Transport, 2011) details exactly what data are required to be collected by the Police as part of the STATS19 system. This includes data on all road collisions they attend or are made aware of in which at least one person is killed or injured. The data cover the circumstances of the collision (e.g. road layout, speed limit, weather conditions), the vehicles involved (e.g. types, manoeuvres, driver details) and the casualties resulting from the collision (casualty ages, severity of injury, whether they were a driver, passenger or pedestrian).

STATS20 and STATS21 provide instructions for the completion of the STATS19 data, and details of the validity checking processes, respectively. In order to aid systematic collection of the STATS19 data, the Department for Transport (DfT) and Transport Scotland produce illustrative STATS19 forms. Use of the illustrative forms is not mandatory and indeed many Police force areas choose to use alternative data collection methods, for example some Police forces design their own paper forms (e.g. Tayside, see D.1) and some collect data via a form on a PDA (e.g. Ayrshire see D.2).

The criteria for the collection of data at road accidents are reviewed approximately every five years (the review is led by the Standing Committee on Road Accident Statistics: SCRAS); as a result the data collected have changed several times since 1979. Previous updates have involved changing the fields and variables collected and some different illustrative forms have been produced during this time (see Figure 2).

Example page layout used in a. 1999 and b. 2004 Department for Transport illustrative STATS19 forms

Figure 2: Example page layout used in a) 1999 and b) 2004 Department for Transport illustrative STATS19 forms

Contributory factor system

The contributory factor system was designed to record information related to why and how each road accident might have occurred, in order to provide insight into how such accidents might be avoided in the future.

A quality review in 2002 revealed considerable variability in the methods used to collect and record contributory factors across Police forces (DfT/SCRAS, 2006). Following recommendation by the University of Southampton's Transportation Research Group (The Scottish Government, 2003), the contributory factor system was simplified from a two-tier system to a single tier system. The number of possible confidence levels for each contributory factor was also reduced from three to two, simplifying the recording system further. The current system allows Police Officers to record factors as either very likely or possible.

As part of the major update in 2004, the 'contributory factor system' was integrated into the illustrative STATS19 data collection form. The system was further reviewed in 2011 when an additional factor was included. Up to six factors (out of a possible 78) may be recorded by Police via the STATS19 form. Contributory factors are the opinion of the reporting Police Officer based on the evidence presented at the time of the collision and are not necessarily the result of an in depth investigation. The Department for Transport state that "contributory factors are largely subjective and depend on the skill and experience of the investigating officer", and so advise that "care should be taken in… interpretation" (Department for Transport, 2011, p. 2).

The Middlesex University form

In response to a technical report which identified inconsistency in the reporting of STATS19 road accident data (Lupton, Jarrett and Wright, 1997), researchers from Middlesex University (Wright, 1999) were tasked with designing a new Police accident report form (known as the 'Middlesex University form') based on the 1999 version of STATS19.

Police forces across Great Britain were surveyed to obtain information on current methods of data collection, the types of forms used for recording data and the strengths and weaknesses of these methods. This information informed the design of a new form (see Figure 3) which was subsequently piloted by eight of the Police forces across GB in order to obtain comments and feedback on the design.

Example screenshots of Police accident report form developed by Middlesex University

Figure 3: Example screenshots of Police accident report form developed by Middlesex University (Transport Scotland, 2013).

From Police feedback, the researchers concluded that the form completion process could be simplified and the likelihood of errors could be reduced by following 'sound principles of graphic design'. However, it was also noted that production of a 'universal' form which met the requirements of all Police forces may be impractical due to differences in the requirements of different Police forces.

Limitations of the STATS19 form

Some previous research has been conducted across GB to assess the limitations of the STATS19 form. For example, the study which influenced the Middlesex University form (Lupton et al, 1997) found inconsistencies in the reporting of road class, breath test, point of impact and school pupil casualties.

A later study by Wright (1999) compared the attributes of a subset of STATS19 collisions with the attributes extracted from the road network to which the collisions had been associated. Inconsistencies were identified which suggested errors in the recording process, in particular for fields relating to junction type, junction control, carriageway type and speed limit. The majority of Police forces across Great Britain responded to a survey about STATS19 data collection. From these responses, Wright (1999) concluded that:

  • STATS19 forms are generally not completed at the scene of road accidents
  • only half of the Police forces surveyed indicated that training was provided on how to complete STATS19 forms
  • accident location, causation factors[1], direction of travel and severity were reported as the most difficult fields to complete
  • an A4 format was preferred for the form since this would be easiest to photocopy and file
  • 61% of those surveyed indicated that they would not want colour to be introduced due to the costs associated with printing
  • most of the Police forces surveyed stated that they would be interested in computerised data collection, subject to cost

Wright (1999) also identified considerable variation in the format of accident reporting adopted by Police forces, including pocket books, full-size A4, single documents and multiple documents, and personal databases used for recording information electronically. Key limitations identified with the form design (relevant to the 1994 version of the illustrative STATS19 form) included:

  • illogical sequence of questions
  • little use of headings or colour to indicate hierarchy
  • difficult to read due to small font
  • different methods of questioning and answering were used on the same form with few instructions
  • requiring officers to enter numbers rather than using a check box

A more recent study by Lupton (2001) identified that, where multiple accidents occur on the same stretch of road, data related to the road layout and features (e.g. speed limit, road type) are captured within the STATS19 data multiple times. The author suggested that it may be possible to define a road network for which all the road features are pre-recorded in a database so that they do not need to be repeatedly entered on the STATS19 database. However the report did not make it clear how this would work in practice.

Fraser (2009) analysed the consistency of data recorded in multiple fields within the STATS19 database. Several inconsistencies were identified including confusion over coding pedestrian collisions: Contributory factor number 801 is 'pedestrian crossing road masked by stationary or parked vehicle'. It is also possible to include details of a pedestrian being masked by a parked or stationary vehicle in the field 'pedestrian movement'. Thus, it might be supposed that a large proportion of collisions recorded with these 'pedestrian movement' details would also be recorded with contributory factor 801, and vice versa. However, Fraser's (2009) analysis revealed several inconsistencies in recorded data related to these fields suggesting the recording process may need to be simplified. One suggested solution was that a 'tick box' layout may be easier for Police to record data accurately, as opposed to having to select options from a long list of codes.

Design of forms for non-data specialists

This section presents findings from a review of the available literature as identified by the TRL information centre in relation to general form design. Eight relevant articles were identified.

The focus of the literature on how the design of a form can affect its usability has centred around two main topics of interest; the answer input mode and the alignment of the questions or labels[2]. The majority of this work has concentrated on online and computer-based forms and surveys, with little research on paper forms being available. However, some researchers have suggested that a user-centred design for online forms should be derived from a format that is already well known to the user such as paper forms (Garrett, 2002 cited in Bargas-Avila, Brenzikofer, Roth, Tuch, Orsini, & Opwis, 2010) meaning that similar principles may apply to both.

Heerwegh and Lossveldt (2002) compared the usability of online forms with various answer input modes and found that participants had a slight preference for radio buttons compared to drop down lists. Radio buttons are a graphical control element that changes appearance when the user clicks on them to show the answer they are selecting. However, they also found that neither format had significant consequences on the quality of the data collected via the forms. The researchers concluded that form design should be based on the sample preferences and the overall purpose of the form.

Another piece of research investigating online forms by Bargas-Avila, Brenzikofer, Tuch, Roth, and Opwis (2011) compared the usability of forms using dropdown lists, free text (including several different conditions with differing label alignment), or a calendar to report a specific date shown to the participants at the top of the form. To determine the usability of each format, the authors analysed the answer format (for example, looking to see if the participants use the correct number of digits in the year), the level to which the answer was correct, the completion time, and the satisfaction of each participant.

Both the calendar and dropdown list versions eliminate answer formatting errors by making it impossible to enter a date in the wrong format, however, the wrong data can still be entered. All free text options had significantly higher formatting error rates. The quickest forms to complete were the free text versions, in particular those with a label to the left of the answer box or a label inside the answer box which disappeared once the participant started to enter the date. These versions were significantly quicker to complete than the free text options requiring the day, month and year to be entered into separate text boxes. They were also significantly faster than the dropdown list and calendar versions. Despite the elimination of formatting errors, the calendar version was the only version that was significantly lower than the others on date accuracy. This may be due to the fact that the calendar version was the only one to require the use of a mouse which may result in clicking errors. Also, a wrong date may have been easier to select than the correct date due to its proximity to the cursor and the number of clicks required whereas entering a wrong date in other versions such as free text takes a similar amount of effort to entering the correct date.

Finally, the measures of user satisfaction found that forms where the labels were inside the answer box were seen as less comfortable to use whereas single text boxes with a label to the left were rated the most comfortable to use. Overall, this suggests that for quick and satisfactory data entry, a single answer box with a label to the left would be most appropriate. Alternatively, for accurate data entry, drop down lists should be used. Conversely, Nielson (2000) suggested that dropdown lists reduce usability if not all the options are visible at once and can be frustrating for people entering well known information.

Hogg and Masztal (2001) conducted a piece of similar research into answer input modes and compared dropdown lists to radio buttons and free text boxes. Their results showed that radio buttons were much quicker for the user but that some users in this condition appeared to tick the same answer box for all the questions which suggests that dropdown lists may lead to more valid data collection.

Research looking more closely into answer format found that to increase the proportion of participants who use the desired format within free text boxes where the format is not fixed, the answer fields should provide information about the desired response format such as using different sized boxes to imply the size of the required answer (Christian, Dillman, & Smith 2007). This research also found that using labels that encourage the desired format also increased the likelihood of this format being used by participants such as the labels "MM" and "YYYY" for the month and year.

Other literature has looked into the effects of the alignment of the question or label used in a form on data quality and form usability. Das, McEwan, and Douglas (2008) used eye tracking technology on a small sample of participants to evaluate label alignment in online forms. The labels were presented either above or to the left of the answer box. Those presented to the left were either aligned to the left or the right. The analysis found that participants with the labels above or right aligned completed the form substantially faster than those with the label aligned to the left. The authors suggest that for forms with constrained space, using left labels with right alignment would be the better option, whereas if space is not a confining issue, top labels should be used. However, no attempts were made to control the order of the completion of the different forms meaning practice effects may be present within the results. The results may also have been influenced by the increased amount of space between the label and the answer box caused by the left alignment and column spacing used in this condition.

These results are similar to those found by Penzo (2006) who also used eye tracking to analyse both the label alignment and the answer input mode. This research found that left alignment of the label took a single eye movement and led to good form performance based on time and accuracy. However, this eye movement was relatively slow compared to the other conditions where participants made more eye movements at quicker speeds. This suggests that the left alignment causes a relatively high cognitive load created by the increased distance between the labels and answer box. Participants were also found to pay more attention to drop down boxes; the authors suggested this was possibly due to the increased interactive element implying greater importance. However, participants took longer to complete forms using drop down lists due to multiple eye movement towards the label. The form versions using right aligned labels were the fastest to complete and required less visual fixation. Other research findings from this experiment included the finding that using a bold font in the label increased fixations and form completion time.

Bargas-Avila, Brenzikofer, Roth, Tuch, Orsini, and Opwis (2010) conducted a literature review about online form design and produced 20 guidelines on how to design usable forms. The guidelines are presented in Appendix B. Those with empirical support include placing the label above the input field to enable quick data entry, coordinating the size of the answer field to the expected length of the answer, using check lists for multiple answers, using drop down lists where there is more than four options, and using labels that imply the required format.


Since its inception in 1979, the design, content and appearance of the STATS19 collection system has changed multiple times. Across Scotland there appears to be considerable variation in the methods of data collection used in Legacy Police Force areas, presenting challenges for the production of a universal form which meets the requirements of every area. However, key limitations and inconsistencies have been identified in previous research which may be addressed through a new form design, standardisation and training on how STATS19 data should be collected and recorded.

The literature suggests that the design of a form for non-specialists should be tailored around the user and form purpose. Some designs are quicker for users to complete whereas others tend to lead to a better quality of data although they cannot eradicate incorrect data entry. Most studies found that participants preferred tick lists or free text boxes to drop down lists and other more interactive input modes but agree that drop down lists lead to fewer data entry errors. However, drop down lists have been found to work well if all options are visible simultaneously and no scrolling is required because they can attract users' attention more than other input formats. The literature also agreed that participants find forms where the label is above or to the left of the answer box with right alignment the most comfortable as well as the quickest to use.