6. Results from automatic data capture
A requirement of the study was to investigate the data on journey time reliability available within Transport and Traffic Scotland and other potentially useful data on transport time reliability. A key issue here is whether these data are sufficient for the development of empirical models that can predict travel time reliability and that can be integrated into the existing large-scale traffic model systems. As found in the literature from empirical studies (among others, in the US and the Netherlands – see Mahmassani, 2011, and de Jong and Bliemer, 2015), there seems to be a near-linear relationship between the route travel time standard deviation and the route travel time divided by the route distance. In other words, the route standard deviation is a (near) linear function of the inverse of the average speed on the route. Other researchers (e.g. in England and Sweden – see Bates et al, 2001, and Eliasson, 2004) have found a different time-based specification with the relative standard deviation as a function of the congestion index (the ratio of actual travel time to free-flow travel time). The study first looked into what data was available.
A "Congestion Data Report 2006", produced in 2008 for TS, was inspected. This report used data taken from automatic traffic counters located at fixed monitoring sites throughout the Scottish trunk road network. The counters provide data on traffic volumes and speeds, both broken down by vehicle type, in 15 minute intervals (or bins). This point data was scaled to particular road sections using "Floating Vehicle" or "Moving Observer" surveys. Such surveys are conducted by driving a vehicle along the section of road concerned, noting the number of times other vehicles (of that type) overtake, or are overtaken. About 260 trips were made at each site. Various measures of Journey Time Reliability were produced. The report itself gave no data usable in the present study, but helped point to the sort of data available.
The study inspected some Moving Observer studies of journey times, but these appeared to be very limited. Even as many as 260 trips at a given site reduces to small numbers when split over months of the year, days of the week, and times of day. With 12 months in the year, 7 days in the week, and four periods within each day, 336 trips would be required just to get one observation in each cell. Each vehicle needs a Driver and Observer, and so the method is expensive even for low levels of statistical accuracy. The method might have some applicability if used in connection with data from automatic traffic count sites, such as was done in the Congestion Data Report. In any event, other data appeared to be insufficient to meet the needs of the study.
Preliminary investigation of speed data obtained automatically from fixed road sites suggested there might be a good spread of such sites by region and road type. Further enquiries found that while data was routinely extracted in one hour bins, and could easily be made available to the study, there was an alternative of obtaining data in 15 minute bins, but with a non-trivial cost. Both these alternatives looked poor. The 60 minute bins would water down any peak effects, which might occur for 10 minutes either side of the hour, for example. The 15 minute bins proved too costly for the study. A third option emerged, that of using Vehicle by Vehicle (VBV) data, and specifying the bin width. This had obvious merits, though it imposed a severe data extraction burden on expert Transport Scotland staff. This extraction proved to be very time consuming, so the study extremely grateful for the efforts made to extract as much data as they could for the study.
At the time of briefing on this data source, 1791 ATC sites were stated as producing reliable data. Such data goes back to about 2002. Of those sites, 1441 were stated to have speed data available, and 511 sites had VBV data availability. After consulting interested parties within Transport Scotland, data for 38 sites was requested, giving a good coverage both geographically and by road type. Not all had usable data and there was a limit to the time available (both 'input' and 'elapsed') for extraction, so that the study actually received data for only 31 sites, but that was judged adequate.
Several problems arose. The data collection systems clearly came in several forms. The processed data, which was relatively well documented, had clearly used different coding conventions to the extracted raw data. This extracted data came in 4 different formats for which the study had to document, and probably another 2 which were never properly understood. For some sites a high proportion of vehicles could not be classified due to their code falling outside of the notified ranges. It was decided to exclude "Cycles", but otherwise all vehicles have been included as ALL. Those with appropriate codes have been classified as either CARS or HEAVIES. In at least one case the average speed of the HEAVIES is over 100 kph, and little lower than that for CARS, suggesting that the classification has not worked correctly at that site. Nevertheless, this site has been kept in the analysis, as have the few oddities encountered. To have done otherwise would have been ad-hocery, and might have given a misleading impression regarding how well behaved the data is. Data for quite a few of the months is missing, and this has been dealt with differently according to circumstance.
It was decided to aggregate the data into 10 minute bins, partly to be different and partly to see if the data would stand that. The original data files were very large, are even the reduced data set had entries for a potential 6x24 = 144 ten minute bins; for 3 vehicle types (ALL, CARS, & HEAVIES); for the 12 months of 2013, for 31 sites. That gives some 160,000 rows. On each row the records show the 10 minute bin, the number of vehicles observed, and the average speed of vehicles.
The initial inclination was to take the average speed for a quiet period (03.00 to 05.00 was chosen) as a proxy for the free-flow speed. That speed was then divided into the average speed for 08.00 to 08.10, to try to get a measure of the speed reduction in the peak. However, about half the values returned were greater than unity, suggesting that 08.00 to 08.10 was not a peak on that road – indeed it was often very quiet at that time! As the data was mostly two-directional, it had been expected that there would be a speed reduction at 08.00 in at least one direction big enough to give a ratio below unity. Since it appeared that conditions varied from site to site, no better times (than 08.00 to 08.10) for use in this method could be identified, and so the approach was dropped. Instead, the rather simplistic approach of taking the ratio of the lowest speed to highest speed over all available 10 minute periods was adopted. There seemed to be no problem with the highest speed calculations, but some of the lowest speed calculations were very low indeed, suggesting particular incidents affecting the traffic. This is a potential problem of outliers affecting the results, for this approach. However, there did not seem to be many obvious cases of this, and any severe incidents may have affected several 10 minute periods, so just excluding the lowest speed was not expected to make much difference.
Alternative methods would have been to take the ratio of the lower quartile speed, the mean speed, or the median speed, to the highest speed. All of those, though, would have muffled the observed variability in the ratios, which was not that great to begin with. Trunk roads often have sufficient capacity that any increases in traffic at particular times have little effect on average speeds then. Indeed, at busy times drivers may feel impelled to keep up with other vehicles, while at quiet times they may dawdle at their own preferred speed. On trunk roads in built-up areas, the speed limit may also limit the degree of variability of speeds over 10 minute intervals.
On average, over the 31 sites, the data indicates that 10 minute average speeds fall by about 35% from the fastest to slowest. That degree of variability seemed sufficient to analyse, but not so large that it became desirable to replace the "slowest" with one of the alternatives just listed. To have chosen to work with ratios of highest to mean or median might reasonably have been expected to reduce the average fall in speeds to about 20%, with consequent increased difficulty in separating signal from noise.
Table 6.1 presents the results for the ratios of minimum speed to maximum speed, for CARS. The 31 sites are listed in the first 4 columns, showing: their reference number (so that readers can cross check with other sources); the road number; a brief description of where the location is; and an indicator where only one lane is involved (Northbound or Southbound). Then there are columns for each month of 2013, showing: the ratios, their mean; their median; the rank of each, followed by a judgemental rank; and finally a grouping of months based on their ranks. The final 5 columns give similar averages, ranks and grades, but this time for the individual sites. The Grand Mean is shown as 0.64 and the Grand Median as 0.65, but these are of no particular significance since the sample of roads is neither random nor representative of all Scottish Trunk Roads.
Beginning by looking at the results by month, it can be seen that there is surprisingly little variability in the means or medians of the ratios – all lying between 0.59 and 0.69. It must be remembered that there is noise in the data, but surely not sufficient to account for this lack of variability. Perhaps surprisingly, March shows the smallest amount of travel time variability on average. As it is usually the 8th of March that is taken, this cannot be due to Easter, nor half term. Second best is August, when the weather is good (relatively) and Schools and Universities are not in term. When the missing observations are replaced by their row means, October to December remain the worst 3 months, so that result is not driven by the data gaps. These 'INFILL' results may not be at all correct, since the unobserved data may actually have followed the trend of the observed data, rather than being at the site average. Accordingly, it is preferred to keep with the results shown in Table 6.1.
Turning to the results by site, filling in the blanks with the row means obviously has no effect on the row means, and little effect on row medians, and so Table 6.1 can be taken at face value. The 'good' roads (for journey time variability) are: A90, Forfar; M73, Gartcosh; M9, near jn 10; and M876, Bonnybridge; i.e. mostly motorways. The worst motorway was the M80, at Haggs. The spread of mean ratios over the sites was much greater than over months, suggesting that 'noise' in the data is not a great problem. On the best road, the ratio only falls to 0.86 at any time during the day, i.e. peak speeds are only 14% less than free-flow speeds. The worst road for reliability, ranked 31 both on mean and median, was the A90, Ferrytoll. Thus, the A90 was both the best and the worst, at different places. One explanation for the 'worst' result might be that traffic was only recorded in the southbound direction, so the opportunity (at most other sites) to average out peak direction slowness with contra-peak normal running was not available. Even so, the average ratio of worst 10 minute speed to free-flow speed, for this site, was only 0.37, i.e. not much more than a third. It is, of course, possible that the data is unreliable for some reason. Without knowing the 31 sites more intimately, it would not be sensible here to speculate further on what might be going on.
Table 6.2 repeats the analysis of Table 6.1, but for HEAVIES rather than CARS. HEAVIES are everything not included in CARS that were neither coded as a cycle, or given an ambiguous code. Buses and coaches are certainly included. As previously remarked, the study may have inadvertently included too much in HEAVIES due to failing to adequately understand the multiplicity of undocumented coding schemes relating to the (prized) raw data. Any future work with this raw data would need to take extra care, and seek to rigorously document the vehicle type codings. Looking at the results, it can be seen that the average fall of speeds (from the free-flow speed) is over 40%, noticeably higher than for CARS. That was unexpected, as many heavies have a lower maximum legally permitted speed than cars. However, that is what the data says, consistently over most sites and all months.
Starting with the results by month, perhaps not surprisingly the ordering is very similar to that for CARS. However, this time March stays the 'best' month (i.e. least reduction from free-flow speed on that day) even when the missing observations are infilled with their row means (not shown). Both methods agree that November and December are the worst months. October is very bad for CARS, but not for HEAVIES.
Table 6.1: Ratios of Minimum to Maximum Car Traffic Speeds: Cars
Table 6.2: Ratios of Minimum to Maximum Car Traffic Speeds: Heavies
Table 6.3: Mean Car Speeds
Turning to the results for HEAVIES by site, Table 6.2 clearly shows the M73 at Gartcosh as having the smallest speed reduction relative to free-flow, at less than 10%. However, only 6 months are observed, including the 'best' (March), but excluding the two 'worst' (November and December), so this result may not be reliable. In second place, the M9 south of Junction 10, has even more data missing. In third place, the M8 at Harthill, again has both of the worst months missing. Notwithstanding this missing data, it seems that it is no coincidence that these 3 best sites are all Motorways, appearing to give a very good service to the HEAVIES. Turning to the worst site, this is again the A90 at Ferrytoll, with minimum speeds less than a third of the free-flow speed. The remaining 27 sites obviously lie between the extremes just discussed, but there is no obvious connection of sites that would justify any speculation here as to causes.
The final table in this section, Table 6.3, shows actual mean speeds for CARS, in the same format. This time there is a noticeable difference between the Grand Mean (71.69 kph) and Grand Median (75.15 kph). Both show average speeds around 45 mph, but this is not important in itself as the sample's mix of urban and rural sites is neither random nor representative. The ordering of Median greater than Mean shows that there is negative skew. That means that, compared to a symmetric bell shape, there is a longer tail of lower speeds than of higher speeds. Given that there are some urban 30mph limited roads, that should not be surprising. At the top end, the 70 mph limit gives a cluster just below that speed, as will be seen.
Looking first at the results by month, the observed data gives a simple, and rather surprising picture. Moving through the year in conventional order, January and February are the 'fastest' months (above 75 kph), followed by March to May (some 3 kph slower), then June to September (slower still), and finally October to December the lowest (with mean speed below 70 kph). Very neat, but rather hard to explain. On the second Fridays of January and February, one might expect traffic to be heavy, visibility to be poor, and the weather to be disruptive; but, apparently, that is as good as it gets for Scottish trunk road traffic. Infilling unobserved cells with their row means changes the picture enormously. The INFILL rankings and gradings (not shown) report June and July as the best months, followed by March to May, then Jan & Feb, then Aug & Sep, and finally October to December as worst. So, it is sure which months are worst, but not which months are best. The missing data for A90 Ferrytoll at the beginning of the year is obviously favouring those months. Further scrutiny of individual sites suggests that the INFILL results, this time, really are the more reliable. The difference in speeds between sites is so great that to average over different sites each month clearly distorts the results, in a way that was not clearly the case for the 'ratio' data on Tables 6.1 and 6.2. It is therefore concluded that June and July really are the 'fastest' months.
Turning now to the individual sites, the 'fastest' are the M9 south of Junction 10, and the A90 at Forfar. Both have average speeds above 108 kph (67 mph). This is understandable for the Motorway, but it does seem high for an A road. Not much 'slower' are the M8 at Harthill, the M73 at Gartcosh, the M74 at Cambuslang, the M80 near Haggs, and the M876 at Bonnybridge; all Motorways so no raised eyebrows. The 'slowest' road is the A737 Kilwinning Road in Dalry, but this is just one of the several urban sites with appropriate average speeds of around 20 mph.
An important aspect of Table 6.3 is that it shows the degree to which average speeds vary from month to month for a given site. This goes directly to the reliance car travellers can place on their expected journey times. On Motorways there is very little variation from month to month, as might be expected, but the variation is not that great for the other road types either. Individual observations may merely reflect incidents on the day in question, with no clear cases of sites where car speeds are greatly worse at some times of years than others. The nearest case to that is the M898 Erskine Bridge, where the October to December mean speeds of 47.4 kph, 58.8 kph, and 68.4 kph differ greatly from the 12 month mean of 83.7 kph. This, though, does appear to be exceptional.
From the theory set out on Section 4, the study was led to look for a linear relationship between the standard deviation of speeds over the (12 days of the) year against the inverse of speed. Table 6.4 reports the findings. Only 17 sites had data for all 12 months, and one of those was excluded as being a clear outlier. A good R-sq was obtained with those 16. Another two sites had major data interruptions on some days, but excluding them made hardly any difference to the results. Finally, a site which had one bad afternoon was excluded. This actually made the R-sq slightly worse, but the intercept and slope were again hardly affected. This regression line is shown in Figure 6.1. Section 8 will speculate on how such a model might be taken forward to be of use.
Table 6.4 Effect of Data Cleaning on the Linear Regression of Standard Deviation of Monthly Speeds against the Inverse of Speed
NUMBER OF SITES
R – SQ
Figure 6.1 Linear Regression of Standard Deviation of Monthly Speeds against the Inverse of Speed