Appendix A: Scottish Household Survey - Background information

Interviewing, response rates and weighting
Highest Income Householder
Adult
Household types
Annual net household income
The SHS urban/rural classification
The Scottish Index of Multiple Deprivation (SIMD)
SHS Travel Diary
- Journey definitions
- Impact of analysing journeys over stages
- Mode of transport
- Day of the week
- Bias
- Imputation and Quality Assessment
- Calculating distance
- Calculating duration
Sampling variability and confidence limits
Published results, and anonymised data
Enquiries and further information

A.1 The Scottish Household Survey (SHS) started in February 1999. Its principal purpose is to collect information to inform policy on Transport, Communities and Local Government, but other topics are covered, such as household composition, amenities, employment or unemployment, income, assets and savings, credit and debt, health, disabilities and care, and other topics. The SHS provides the first representative Scottish data on many subjects, such as access to the Internet, daily travel patterns, etc.

A.2 Where appropriate, the SHS uses the harmonised concepts and questions for government social surveys which have been developed by the Government Statistical Service, to facilitate comparison with the results of other government surveys. However, differences in sampling and survey methods mean that SHS results will differ from those of other surveys. The SHS is not designed to produce statistics on unemployment or income: it collects such information only for selecting the data for particular groups of people (such as the unemployed or the low-paid) for further analysis, or for use as background variables when analysing other topics.

A.3 The SHS is intended to be a survey of private households. For the purposes of the survey, a household is defined as one person or a group of people living in accommodation as their only or main residence and either sharing at least one meal a day or sharing the living accommodation. A student's term-time address is taken as his/her main residence, in order that they are counted where they live for most of the year.

A.4 The sample was drawn from the Small User file of the Postcode Address File (PAF), which is a listing of all active address points maintained by the Post Office. The Small User file excludes addresses where an average of more than 25 items of post is delivered per day. Blocks of flats etc, which have several dwellings at the same address, are not excluded from the Small User file: in such cases, the file's Multiple Occupancy Indicator is used to count each dwelling separately for the selection of the sample.

A.5 People in certain types of accommodation (such as nurses' homes, student halls of residence etc.) will be excluded from the SHS unless the accommodation is listed on the Small User file of the PAF and it represents the sole or main residence of the people concerned. People living in bed and breakfast accommodation may be included, if it is listed in the Small User file of the PAF and if it is their sole or main residence. Prisons, hospitals and military bases are excluded.

Interviewing, response rates and weighting

A.6 The survey interviews are carried out in respondents' homes using Computer Aided Personal Interviewing (CAPI). Each interview has two parts. The first part is carried out with the Highest Income Householder or their spouse or partner. This collects mainly factual information about the composition and characteristics of the household. Some questions are asked in respect of each household member. The second part is with a randomly-chosen adult (aged 16+) member of the household. This focuses on individual attitudes and behaviours.

A.7 The data are weighted to take account of the unequal probabilities of selection inherent in the sample design: the over-sampling (relative to their numbers of households) of the Councils with smaller populations, in order to obtain a minimum number of interviews in each Council; and the under-sampling (relative to their share of the adult population) of adults living in multi-adult households, because only one random adult is interviewed in each household.

A.8 Totals may appear to differ slightly from the apparent sums of their component parts, in cases where they have been calculated by adding up the unrounded values of the components and then rounding each figure independently. Similarly, percentages may appear not to sum to 100 per cent.

A.9 In tables that analyse the results of questions for which multiple answers were allowed, the percentages may total more than 100 per cent.

A.10 The underlying sample numbers shown in different tables may not be the same. There are a number of reasons for this - the questionnaire is streamed to allow more questions to be asked so not all respondents are asked all questions, tables may relate to specific populations (e.g. working aged population), not all questions will be applicable (e.g. households with no children would not be asked questions about children) and, in some cases, respondents were unable to, or did not want to, provide an answer (e.g. for income questions).

Highest Income Householder

A.11 This is the household reference person for the first part of the interview. This must be a person in whose name the accommodation is owned or rented, or who is otherwise responsible for the accommodation (i.e. spouse or partner). In households with joint householders, the person with the highest income is taken as the household reference person. If householders have exactly the same income, the older is taken as the household reference person.

Adult

A.12 For the purposes of the SHS, an adult is someone who was aged 16 or over at the time of the interview; a child is someone who was aged 15 or under.

Household types

Single pensioner household consists of one adult of pensionable age (65+ for women, and 65+ for men) and no children
Single parent household contains an adult and one or more children.
Single adult household consists of an adult of non-pensionable age and no children.
Older smaller household contains either (a) an adult of non-pensionable age and an adult of pensionable age and no children or (b) two adults of pensionable age and no children.
Large adult household has three or more adults and no children.
Small adult household contains two adults of non-pensionable age and no children.
Large family household consists of either (a) two adults and three or more children or (b) three or more adults and one or more children.
Small family households consist of two adults and one or two children.

Annual net household income

A.13 This is the total annual net income (i.e. after taxation and other deductions) from employment, benefits and other sources, which is brought into the household by the highest income householder and/or their spouse or partner. This includes any contribution to household finances made by other household members. Due to refusals or don't knows, full information for the main components of household income was not collected from all households. Subsequently, SHS contractors impute the missing components of income for almost all of these households, using information that was obtained from other households that appeared similar.

The Scottish Index of Multiple Deprivation (SIMD)

A.14 The Scottish Index of Multiple Deprivation (SIMD) is used to rank the data zones used for the production of Scottish Neighbourhood Statistics in order of deprivation. More information can be found at the SIMD website (http://www.scotland.gov.uk/simd).

A.15 Households in the SHS sample have been allocated the SIMD value of the data zone that contains the postcode of the residence. In the small number of cases where a postcode is split between more than one data zone, the SIMD value used is that of the data zone into which the largest number of dwellings in that postcode falls. The SIMD values have further been assigned to one of 5 quintiles, with quintile 1 containing the most deprived 20 per cent of data zones in Scotland, and quintile 5 the least deprived 20 per cent.

The SHS urban/rural classification

A.16 The urban/rural classification is based on settlement sizes and (for the less-populated areas) the estimated time that would be taken to drive to a settlement with a population of 10,000 or more. The classification is based on postcodes. Six categories were then defined:

Large urban areas - settlements with populations of 125,000 or more.
Other urban areas - other settlements of population 10,000 or more.
Accessible small towns - settlements of between 3,000 and 9,999 people, which are within 30 minutes drive of a settlement of 10,000+ people
Remote small towns - settlements of between 3,000 and 9,999 people, which are not within 30 minutes drive of a settlement of 10,000+ people
Accessible rural areas - settlements of less than 3,000 people, which are within 30 minutes drive of a settlement of 10,000+ people
Remote rural areas - settlements of less than 3,000 people, which are not within 30 minutes drive of a settlement of 10,000+ people

A.17 The urban/rural classification used for the SHS data is based on the Settlement file maintained by the National Records of Scotland (NRS).

SHS Travel Diary

A.18 The SHS Travel Diary collects information about travel for private purposes or for work or education, provided the main reason for the journey is not in the process of business. It includes the following types of travel - personal travel for domestic, social or recreational purposes and journeys made to take or escort someone else.

Journey Definitions

A.19 Journeys made by land, air or water within the United Kingdom are included. Journeys which start or end outwith the UK (e.g. a holiday flight from Spain) are excluded. However, if a respondent were to say that they had flown back from a holiday abroad on the previous day, the interviewer should record details of the journey home from the airport (but not record details of the flight to the UK).

A.20 The SHS Travel Diary does not cover: journeys which are made in the course of work by people who are employed as drivers or crew of public transport vehicles, to drive lorries, to deliver letters, parcels, leaflets or goods, as police officers etc. However, it does cover their journeys to and from their places of work; travel away from public roads or highways and recreational journeys.

A.21 The basic unit of travel, a journey, is defined as a one-way course of travel having a single main purpose. Outward and return halves of return journeys are treated as two separate journeys. If a single course of travel involves a mid-way change of purpose then it, too, is split into two journeys.

A.22 From 2007 Journeys less than ¼ mile or shorter than 5 minutes on foot are recorded. Previously these were excluded. This is in an attempt to reduce any under reporting of short (likely to be) walking journeys. This has resulted in an increase in the proportion of walking journeys with corresponding decreases in the proportion of journeys by other modes. Care should be taken when comparing pre and post 2007 results as some time series data is not directly comparable. Some time series data are less affected by this change (e.g. driver journeys delayed due to congestion).

A.23 A journey can consist of one or more stages. A new stage is defined when there is a change in the form of transport or when there is a change of vehicle requiring a separate ticket.

A.24 The purpose of a journey is normally taken to be the activity at the destination. Prior to 2007 a journey home was defined by the purpose at the origin of the journey (e.g. a journey from shops to home would be defined as shopping.

A.25 From 2007 onwards only a direct reverse journey of the outward journey (e.g. going straight home from work after travelling directly there earlier in the day) is classed as the origin's purpose (i.e. going to work). Non direct return journeys (e.g. going to the cinema before travelling home) would be defined by their own purpose (e.g. cinema, then going home). Hence from 2007 onwards a new category of "go home" exists (in addition to "go for a walk" resulting from the inclusion of short journeys under 5 min or ¼ mile). Changes to the survey in 2012 resulted in a higher number of journeys being recorded as 'go home' because of changes to the way the return journeys were picked up.

A.26 Some of the categories which are identified in the survey do not appear in subsequent tables presenting detailed analysis, as few journeys were recorded for them.

Impact of analysing journeys over stages

A.27 Given that journeys can potentially be made up of many stages, it might be speculated that figures calculated for journeys would be different than those calculated for stages.

A.28 In practice, comparisons have found that there is little if any difference between the equivalent figures for journeys and stages. This is primarily because multi-stage journeys are rare. In 2011, only 217 journeys out of 17,806 had more than one stage and prior to 2012, only around 1 per cent of journeys were multi-stage. Since 2012, due to changes in survey methodology the proportion has increased to nearer 4 per cent but this doesn't impact on results, see Table TD2 and Table TD2b for comparisons. Given that the overwhelming majority of journeys are only one stage, it follows that the difference between figures for stages and journeys is slight.

Mode of transport.

A.29 Vans are included with cars; taxis and minicabs are in a separate category from ordinary cars; and there are separate categories for rail and underground, and for school bus, works bus and ordinary (service) bus. However, some of these modes of transport do not appear separately in the tables, because few journeys were recorded for them. Therefore, the other category includes, motorcycles, ferries, aeroplanes and all other forms of transport that are not shown separately.

A.30 Where a journey involves more than one mode of transport (e.g. a bus then a train), the main mode of a journey is defined, the main mode of the journey is the one used for the longest (in distance) stage (as in the GB National Travel Survey (NTS)). This definition does not use the total of the distances travelled by each of the different modes to determine the main mode - e.g., a journey involving a 1 mile walk to a bus stop, a 1 1/2 mile bus ride and a 1 mile walk to the ultimate destination is classified as 'main mode = bus', as bus is the mode of transport used for the longest stage of the journey, even though more than half the total distance is covered on foot. If there is no single longest stage, and the two (or more) longest stages do not involve the same mode of transport, the main mode of the journey is the mode used for the last of the longest stages. In practice, because of the way that the distances are calculated, it is unlikely that there will be many journeys which have two stages that involve exactly the same distance.

Day of the Week

A.31 The Travel Diary collects information about journeys that were made on the day before the interview: so, someone interviewed on Sunday will be asked about the journeys they made on Saturday. Journeys that start on one day and finish on another should be counted on the basis of the day on which they started.

A.32 Interviews are not spread evenly across the week, because some types of people are more likely to be found at home, available for interview, on certain days. Therefore, the results are weighted using factors, which depend upon the day of the week and the adult's current situation (or economic status), so that, within each category of current situation, the weighted number of interviews are spread evenly across the days of the week. The weighting process covers all interviews, including those with people who had not made any journeys on the day before the interview. Therefore, the weighted numbers of people who said that they had made journeys, and the weighted numbers of journeys themselves, are not necessarily evenly spread over the days of the week.

A.33 Although the total number of weighted interviews are evenly spread across the week, this is not the case at the local authority level. Therefore, any analysis by day of week should be treated with caution.

Bias

A.34 The SHS results may be biased, tending to over-estimate the number of journeys, because the interviewer asks only about travel on the previous day: e.g. people may be more likely to be interviewed on the days on which they made no journeys than on the days on which they made many journeys, since they are more likely to be available for interview on days on which they have not made any journeys. Therefore, the probability of being interviewed on a particular day depends, to some extent, upon the amount of travel on that day. It follows that the day for which the information about journeys is collected (the day before the interview) does not represent a "completely random" choice of day, and therefore that the Travel Diary results may not be properly representative.

A.35 However, comparisons with (pre-2007) results of the GB National Travel Survey (NTS) suggest that the SHS Travel Diary under-estimates the number of journeys made by adults. This may have been because prior to 2007 journeys of less than a quarter of a mile, or of less than five minutes by foot were excluded. Also details of the previous day's travel are provided 'off the top of the head', as opposed to logged in a week long diary (as per the NTS) and therefore some journeys may be overlooked.

A.36 Comparisons between the NTS and SHS Travel Diaries were the subject of an article in the National Travel Survey 2009/10: Scotland Results. The publication can be accessed through the Transport Scotland website: http://www.transportscotland.gov.uk/strategy-and-research/publications-and-consultations/j221325-00

A.37 Detailed Scottish level information can be found at: http://www.transportscotland.gov.uk/analysis/statistics/publications/nts-scottish-results-previous-editions

Imputation & Quality Assessment

A.38 Additional journeys have been imputed, in cases where it is obvious that they are missing - e.g. if the only journey recorded for the day was to work at 8.00 a.m., a return journey was imputed, using the same mode of transport and with the same duration. The imputation process uses information about the time spent at the destination by other people with the same current situation (economic status) who had reported making both an outward journey and a return journey for the same purpose. The average times spent at the destination, and the distributions of such times, are used to impute the times at which the return journeys would start. If the imputed time is after midnight, a return journey is not imputed.

A.39 Quality assurance procedures of Travel Diary data have also been improved, in light of the new Travel Diary structure. This has resulted some duplicate journeys deleted and some adjustment to raw data.

A.40 More information on the methods of imputation & quality assurance can be found in the Travel Diary User Guide, which is available on the SHS website: http://www.scotland.gov.uk/shs

Calculating Distance

A.41 The interviewer asks where the person started from, and where they went to, and records the origin and destination of each stage of each journey. When appropriate, the interviewer can specify that the previous destination is the origin of the current stage/journey. Exact postcodes are determined/checked at a later stage in the processing of the data from the survey. In cases where only an approximate location is recorded (e.g. centre of Edinburgh), an arbitrary postcode (such as that of the main post office) is assigned. In some cases it may be unable to allocate a postcode from a postal district (e.g. EH10). Inevitably, there are occasions where no exact indication of location of the origin/destination can be determined. Continuous improvements to interviewers' computer systems result in improved location data over time.

A.42 The length of any journey stage is the estimated straight-line distance, based upon the grid co-ordinates of the centres of the postcodes of the origin/destination of that stage of the journey. In cases where the interviewer could not obtain sufficient details of the origin/destination to a postcode to be assigned, the distance travelled is imputed. The distance of a multi-stage journey is calculated by adding up the distances of each of its component stages. For series of calls journeys, the respondent estimates the total distance for series of calls journeys.

A.43 Distances are reported in kilometres. One kilometre is equivalent to 0.6 mile (or conversely, 1 mile = 1.609 km).

Straight line vs road network distance

A.44 As most journeys are not made in a straight line, the distance will underestimate the actual distance travelled. Since 2012, the survey contractors have provided an additional variable containing the road network distance. This has been used to recalculate those tables that use journey distance (Tables TD2a, TD4, TD4a, TD5 and TD5a) . A piece of work was undertaken in 2009 to investigate the extent of the underreporting of Travel Diary distance as a result of using the straight line distance, this was updated in 2014 using 2012 actual road distance data.

A.45 These reports conclude that:

Straight line distance underestimates total distance travelled by around a third.
Underestimates are greater for shorter journeys, as these are likely to stray further from a straight line. For example. a short journey in a town may have to take 3 sides of a square to get around buildings or follow a one way system.
There is variation of scale of impact by mode of transport but this is a result of journey length ie walking and cycling journeys are shorter than car journeys on average.

A.46 There are caveats with the road network distance which is why it has not been used in the main SHS TD tables at this point. The limitations are:

There are many routes through the road network between two points. The one used in the creation of the variable is the shortest distance but another possibility would be to use the route that takes the shortest time based on average speeds. Other factors that are harder to model would be route choice variation by time of day eg avoiding busy roads at rush hour.
Road network distance is used for all modes due to the complex and time consuming nature of the computer processing.
o This is likely to result in an overestimate of distance for cycling and walking as more direct routes may be used eg roads closed to through traffic that allow cyclists to pass through and short cuts people can take on foot across open ground.
o Bus routes may not use the most direct route between two points eg the service may divert through an housing estate on the way between two points.
o Rail journeys will obviously not use the road network, and as rail journeys are longer they will tend to be closer to a straight line. The road network distance for rail journeys is included as a comparison.

A.47 In future it would be possible to develop a distance measure that used the public transport network for bus and rail journeys, though it would be more difficult to create an accurate estimate of distance for walking and cycling. In the interim the road network tables are included for use alongside the straight line distance tables to understand the scale of underestimation. Whilst creating an indicator for distance using a mix of road and straight line distances which would provide improved accuracy for cars and buses, a choice would have to be made over which is the best measure for other modes of transport, and the resulting figure would be much harder to interpret.

A.48 Both reports, from 2012 and 2014, are available on the Transport Scotland website: http://www.transportscotland.gov.uk/statistics/data-sources-and-methodology

Calculating Duration

A.49 Prior to 2007 the duration of a journey was calculated from the start and end times. As the recording process will only be accurate to - at best - say the nearest five minutes. the estimated durations of some journeys would be subject to possibly large percentage errors. Due to coding problems in the CAPI script in October, November and December 1999, the start time and end time of some journeys are missing for around 4 per cent of journeys for 1999 as a whole. As duration is derived from the start time and end time of journeys, about 7 per cent of journeys in 1999 have a missing duration.

A.50 From 2007 onwards duration is collected direct from the respondent. This aims to improve the accuracy of the data. This means that data prior to 2007 may not be strictly comparable.

A.51 See more at: http://www.transportscotland.gov.uk/statistics/j285661-33/#sthash.0IjbN1k3.dpuf

Sampling variability and confidence limits

A.46 Although the SHS sample is chosen at random, the people who take part in the survey will not necessarily be a representative cross-section of the people of Scotland. Purely by chance, the sample could include disproportionate numbers of certain types of people, in which case the survey's results would be affected.

A.47 The likely extent of sampling variability can be quantified, by calculating the standard error associated with the estimate of a quantity produced from a random sample. Statistical sampling theory states that, on average only about one sample in three would produce an estimate that differed from the (unknown) true value of that quantity by more than one standard error; only about one sample in twenty would produce an estimate that differed from the true value by more than two standard errors; only about one sample in 400 would produce an estimate that differed from the true value by more than three standard errors. By convention, the 95 per cent confidence interval for a quantity is defined as the estimate plus or minus about twice the standard error (from sampling theory, the interval is plus or minus 1.96 times the standard error), because there is only a 5 per cent chance (on average) that a sample would produce an estimate that differs from the true value of that quantity by more than this amount.

Table A shows the 95 per cent confidence limits for estimates of a range of percentages calculated from sub-samples of a range of sizes (NB: the confidence limits for estimates of x per cent and for (100-x) per cent are the same). The formula used to calculate these confidence intervals is:

CI = DFx1.96 x SQRT((%x(1-%))/n)

Where % is the percentage value of interest, n is the sample size it is based on and DF is the design factor for the relevant survey which varies from year to year as a result of the survey sample, see table below:


Year	2006	2007	2008	2009	2010	2011	2012	2013
Design Factor	1.2	1.2	1.2	1.3	1.2	1.3	1.15	1.16

A.48 The interpretation of an entry in Table A is best explained by an example:

The value in the cell at the intersection of the 45 per cent or 55 per cent column and the 800 row is 4.5
This means that the 95 per cent confidence limits for an estimate of 55 per cent which is produced from a sub-sample of 800 are +/- 4.5 percentage-points
The 95 per cent confidence interval for the estimate is 55 per cent +/- 4.5 percentage-points (i.e. from about 50.5 per cent to around 59.5 per cent, assuming that the value of the estimate is 55.0 per cent)

A.49 As the survey's estimates may be affected by sampling errors, apparent differences of a few percentage points between the figures for two sub-groups of the population may not be significant: it could be that the true values for the two sub-groups are similar, but the random selection of households for the survey has, by chance, produced a sample which gives a high estimate for one sub-group and a low estimate for the other.

A.50 One way of assessing significance at the 5 per cent level involves comparing the difference with the 95 per cent confidence limits for the two estimates. Suppose that these are +/- 3.0 percentage-points and +/- 4.0 percentage-points, respectively. Clearly a difference which is less than the magnitude of the largest limit (4.0 percentage-points) is not significant; and a difference which is greater than the sum of the magnitudes of the limits (3.0 percentage-points + 4.0 percentage-points = 7.0 percentage-points) is significant. Statistical sampling theory suggests that a difference whose magnitude is between these values is significant if it is greater than the square root of the sum of the squares of the magnitudes of the limits for the two estimates - in this case, (3.0² + 4.0²)^0.5=5.0. So, in this case, a 5.0 percentage-point difference would be considered statistically significant (at the conventional 5% level). However, one may well find some apparently significant results that are actually just the result of sampling variability, having arisen by chance.

A.51 The above information relates only to sampling variability. The survey's results could also be affected by non-contact/non-response bias: the characteristics of the people who should have been in the survey but who could not be contacted, or who refused to take part, could differ markedly from those of the people who were interviewed. If that is the case, the SHS results will not be representative of the whole population. Without knowing the true values (for the population as a whole) of some quantities, one cannot be sure about the extent of any such biases in the SHS. However, comparison of SHS results with information from other sources suggests that they are broadly representative of the overall Scottish population, and therefore that any non-contact or non-response biases are not large overall. The Fieldwork Outcomes and Methodology volumes of Scotland's People provide more information on these matters.

Published results, and anonymised data

A.52 SHS results are also included in Scottish Transport Statistics, published in February.

A.53 Transport statistics publications are available on the Transport Scotland Statistics webpages at http://www.transportscotland.gov.uk/analysis/statistics/publications

A.54 The SHS Annual Report is published by the Scottish Government and can be found here: http://www.scotland.gov.uk/Topics/Statistics/16002/PublicationAnnual

A.55 Anonymised copies of the survey data are deposited at the UK Data Archive.

Enquiries and further information

A.56 General enquiries about the SHS should be addressed to the survey's Project Manager:

SHS Project Manager
Communities Analytical Services
Scottish Government
Victoria Quay
Edinburgh, EH6 6QQ

Tel: 0131 244 0824
Fax: 0131 244 7573
E-mail: shs@scotland.gsi.gov.uk

A.57 Enquiries about the statistics in this bulletin should be addressed to:

Transport Statistics
Transport Analytical Services
Transport Scotland
Scottish Government
Victoria Quay
Edinburgh, EH6 6QQ

Tel: 0131 244 1457
E-mail: transtat@transportscotland.gsi.gov.uk

A.58 Further information about the survey can be found on the SHS website at www.scotland.gov.uk/shs

A.59 This website provides some background to the survey, information about the progress of the survey, and the published results. Copies of the Transport Statistics bulletins can be found on the Transport Scotland Statistics webpages at: http://www.transportscotland.gov.uk/analysis/statistics/publications

A.60 Please use the SHS Web site to register your interest in Population and Household Surveys if you wish to be added to an e-mail mailing list to be kept informed of SHS news and developments. The Project Manager will also, on request, distribute paper copies of information about the survey, and about significant developments when they occur, to people who are unable to access the website.

A.61 To keep informed with changes to Scottish statistics, please register your interest with ScotStat at www.scotland.gov.uk/scotstat.