NHS: Data quality

Related information

The National Household Survey (NHS) contains all of the questions that Statistics Canada contemplated for inclusion in a 2011 Census long-form. The NHS is therefore identical in content to what would have been collected in a 2011 Census long-form.

Data quality

Response rates

In its initial planning, Statistics Canada assumed a response rate for a mandatory 2011 Census long-form of 94%, identical to that achieved for the 2006 Census.

Statistics Canada has assumed a response rate of 50% for the voluntary National Household Survey.

Sample size

In its initial planning, Statistics Canada assumed a sample of one in five households for a mandatory 2011 census long-form, identical to that for the 2006 Census.

Statistics Canada, in consultation with the Minister, has fixed the sampling rate for the National Household Survey at one in three households, a 65% increase relative to the initial plan.

Sampling error

Like the previous long-form census, the objective of the National Household Survey is to produce accurate estimates from the questions asked for a wide variety of geographic areas ranging from very large (such as provinces and census metropolitan areas) to very small (such as neighbourhoods and municipalities) and for various population subgroups such as aboriginal peoples and immigrants. Such population subgroups will also range in size, in particular when cross-classified by geographic areas. These groupings are generally referred to as 'domains of interest'.

For any given domain of interest, assuming random sampling, the sampling error is driven by three factors: the size of the population, the number of survey respondents and the variability of the variables being measured. Amongst these, only the number of survey respondents can be influenced by the survey process. People are familiar with the notion of sampling error through statements in opinion polls about results being 'accurate within plus or minus x%, 19 times out of 20'. The larger the number of respondents, the smaller the value of x will be and therefore the more accurate the survey estimates will be.

With a sampling rate of 1 in 3 and an anticipated response rate of 50%, approximately 16% of the Canadian population will complete the National Household Survey, compared with 19% under a mandatory census long form (i.e., sampling rate of 1 in 5 and a 94% response rate). Given its anticipated lower overall number of respondents, the National Household Survey will, in general over all domains of interest, have a sampling error that is slightly higher (worse) than would have been achieved from a mandatory long-form census. Furthermore, it is expected that the quality of estimates across domains will present more variability, with some areas potentially achieving lower sampling errors than would have been achieved through a mandatory long-form census (because of the higher sampling rate of 33%), while other areas may see substantially higher sampling errors (because of unusually low response rates on the voluntary survey). Smaller domains of interest are particularly at risk of such fluctuations.

The annex to this note provides actual confidence intervals (i.e. plus or minus x%) from the 2006 Census for various variables for the Toronto Census Metropolitan Area, the Winnipeg Census Metropolitan Area and the Bathurst Census Agglomeration (New Brunswick). Provided for comparison are simulated estimates and their corresponding confidence intervals for the National Household Survey based on a 50% response rate.

Non-sampling error

Besides sampling, there are many factors that can introduce errors in survey results. Examples include respondent mistakes, interviewer effects, data collection methodology as well as data capture and processing errors. The move to a voluntary National Household Survey will have little impact on some of these factors (such as data capture and processing errors) but the effect on the other error sources is unknown and impossible to quantify.

However, it is believed that the most significant source of non-sampling error for the National Household Survey will be non-response bias. All surveys are subject to non-response bias, even a Census with a 98% response rate. The risk of non-response bias quickly increases as the response rate declines. This is because, in general, non-respondents tend to have characteristics that are different than those of the respondents and thus the results are not representative of the true population. Given that the National Household Survey is anticipated to achieve a response rate of only 50% there is a substantial risk of non-response bias.

Statistics Canada is very much aware of these risks and their associated adverse effects on data quality. The Agency is currently adapting its data collection and other procedures to mitigate as much as possible against these risks. In particular, we will be using data on response patterns from the 2006 Census and information generated during data collection in 2011 to guide our field follow-up effort to minimize non-response bias. As well, where possible, 2011 Census data will be used as auxiliary information in National Household Survey estimation procedures to partially offset some of the remaining biases. However there is certain to be some residual, significant bias that will be impossible to measure and correct.

To give some appreciation of the potential for non-response bias prior to the implementation of any mitigating strategies, a simulation has been conducted for three geographic areas using the 2006 Census. The simulation compares actual 2006 Census long-form dataFootnote 1 to estimates based on the assumption that 16% rather than 19% of the population responded for selected variables from the Toronto Census Metropolitan Area, the Winnipeg Census Metropolitan Area and the Bathurst Census Agglomeration (New Brunswick). Using this, and similar, information, Statistics Canada will plan its field operations to minimize, to the extent possible, the potential for non-response bias.

Comparability of data over time

Any significant change in the methods of a survey can affect the comparability of data over time. There is a real risk that this will be the case for the National Household Survey. There will always and inevitably be an element of uncertainty as to whether and to what extent a change in a variable reflects real change or an artefact arising from the change in methodology from the mandatory long-form census to the voluntary National Household Survey.

Change in survey processes, however, is inevitable and has precedents even in the Census of Population. In 1971, for example, two major changes were introduced: selfenumeration in the place of interviewer enumeration and asking some questions of a subsample (then 1/3) of the population rather than the entire population (there had been some sampling in previous censuses, beginning in 1941, on a much more limited scale).

Conclusion

We have never previously conducted a survey on the scale of the voluntary National Household Survey, nor are we aware of any other country that has. The new methodology has been introduced relatively rapidly with limited testing. The effectiveness of our mitigation strategies to offset non-response bias and other quality limiting effects is largely unknown. For these reasons, it is difficult to anticipate the quality level of the final outcome.

The significance of any quality shortcomings depends, to some extent, on the intended use of the data. Given that, and our mitigation strategies, we are confident that the National Household Survey will produce usable and useful data that will meet the needs of many users. It will not, however, provide a level of quality that would have been achieved through a mandatory long-form census.

Annex

The following tables are intended to assist readers in understanding quality issues around the National Household Survey by providing some quantitative indicators developed from 2006 Census data.

The following provides a guide to reading the first line of the Toronto Census Metropolitan Area table. Other lines are read analogously for all three tables.

The variable of interest in this line is the total income in 2005 of the population of the Toronto CMA aged 15 years and over. More specifically, the first line looks at the number of persons in this age group with incomes under $1000 or with no income. The estimated number of such persons from the 2006 Census was 435,580. Based on the actual data, the 95% confidence interval around this estimate (since the long-form census was a sample survey) was plus or minus 0.4%. Assuming that the actual response rate had been 50%, which is the working assumption for the National Household Survey, the 95% confidence interval around the corresponding simulated NHS estimate would be plus or minus 0.5%.

Continuing with the first line of the Toronto CMA table, the final column reports results from the simulation of non-response bias for this income group in the absence of mitigation strategies. For this income class, this simulation indicates that the size of population would be underestimated by 4.4% relative to the 2006 Census estimate.

In some instances in the tables, the estimated bias is smaller than the error of estimate at the 95% level of confidence for the 2006 Census. In these instances, one cannot conclude with confidence that the bias exists.

2006 Census (long-form) compared to 2006 simulated NHSCMA Toronto

Estimated total population: 5,061,815
Number of census respondents (long-form): 974,435
Estimated Number of NHS respondents: 728,340

Table Summary

This tables shows selected quantitative indicators from the 2006 Census. It presents the estimates from the 2006 Census and simulated estimates for the National Household Survey along with the estimated bias. The column headings are: selected quantitative indicators, plus/minus percentage 2006 Census, simulated National Household Survey estimate, plus/minus percentage 2006 National Household Survey, and estimated bias in percentage. The rows are the selected quantitative indicators from the 2006 Census long form, grouped by various topics, along with their corresponding values.

2006 Census (long-form) compared to 2006 simulated NHSCMA Toronto
Selected quantitative indicators 2006 Census Estimate +/- % 2006 Census Simulated NHS Estimate +/- % 2006 NHS Estimated Bias (%)
Total income in 2005 of population 15 years and over
Under $1,000 or Without Income 435,580 0.4% 416,415 0.5% -4.4%
$50,000 and over 966,405 0.4% 1,015,780 0.5% 5.1%
Total population 25 to 64 years by highest certificate, diploma or degree
High school or less 982,800 0.4% 945,150 0.5% -3.8%
College/Cegep 534,020 0.6% 529,140 0.7% -0.9%
University certificate, diploma or degree - Bachelor and above 962,175 0.4% 1,002,620 0.5% 4.2%
Total labour force 15 years and over 2,815,845 0.2% 2,821,480 0.2% 0.2%
Total labour force 15 years and over by industry
23 Construction 148,895 1.2% 134,960 1.5% -9.4%
91 Public administration 94,195 1.6% 101,295 1.8% 7.5%
Total labour force 15 years and over by occupation
A Management occupations 320,600 0.8% 320,305 0.9% -0.1%
B Business, finance and administration occupations 590,605 0.6% 614,430 0.6% 4.0%
D Health occupations 124,080 1.3% 123,300 1.5% -0.6%
G Sales and service occupations 611,410 0.5% 594,170 0.7% -2.8%
H Trades, transport and equipment operators and related occupations 327,850 0.8% 302,840 0.9% -7.6%
Total visible minority population 2,174,065 0.4% 2,131,405 0.5% -2.0%
Chinese 486,330 1.1% 572,040 1.2% 17.6%
Black 352,220 1.3% 305,895 1.6% -13.2%
Total population by citizenship
Citizenship other than Canadian 642,130 0.8% 606,050 0.9% -5.6%
Population by Immigrant Status
Immigrants 2,320,165 0.2% 2,274,450 0.3% -2.0%
Total population by Aboriginal and non-Aboriginal identity
Total Aboriginal identity population 26,575 3.4% 25,000 4.1% -5.9%
Registered Indian status
Registered Indian 9,950 3.9% 8,790 4.9% -11.7%
Mobility 1 year
Moved 612,130 0.9% 575,780 1.1% -5.9%
Not moved 4,459,945 0.1% 4,496,295 0.1% 0.8%

2006 Census (long-form) compared to 2006 simulated NHSCMA Winnipeg

Estimated total population: 681,815
Number of census respondents (long-form): 132,155
Estimated number of NHS respondents: 96,735

Table Summary

This tables shows selected quantitative indicators from the 2006 Census. It presents the estimates from the 2006 Census and simulated estimates for the National Household Survey along with the estimated bias. The column headings are: selected quantitative indicators, plus/minus percentage 2006 Census, simulated National Household Survey estimate, plus/minus percentage 2006 National Household Survey, and estimated bias in percentage. The rows are the selected quantitative indicators from the 2006 Census long form, grouped by various topics, along with their corresponding values.

2006 Census (long-form) compared to 2006 simulated NHSCMA Winnipeg
Selected quantitative indicators 2006 Census Estimate +/- % 2006 Census Simulated NHS Estimate +/- % 2006 NHS Estimated Bias (%)
Total income in 2005 of population 15 years and over
Under $1,000 or Without Income 41,590 1.4% 39,715 1.6% -4.5%
$50,000 and over 104,420 1.3% 107,995 1.5% 3.4%
Total population 25 to 64 years by highest certificate, diploma or degree
High school or less 152,670 1.0% 149,180 1.2% -2.3%
College/Cegep 73,235 1.5% 73,435 1.8% 0.3%
University certificate, diploma or degree - Bachelor and above 90,535 1.4% 92,840 1.6% 2.5%
Total labour force 15 years and over 385,870 0.5% 385,360 0.6% -0.1%
Total labour force 15 years and over by industry
23 Construction 18,780 3.5% 17,070 4.3% -9.1%
91 Public administration 27,105 2.9% 27,830 3.3% 2.7%
Total labour force 15 years and over by occupation
A Management occupations 35,480 2.4% 34,810 2.8% -1.9%
B Business, finance and administration occupations 76,155 1.6% 79,225 1.8% 4.0%
D Health occupations 25,885 2.8% 26,475 3.3% 2.3%
G Sales and service occupations 95,180 1.4% 93,505 1.6% -1.8%
H Trades, transport and equipment operators and related occupations 51,715 1.9% 49,105 2.4% -5.0%
Total visible minority population 102,945 2.3% 99,340 2.8% -3.5%
Chinese 12,810 7.0% 12,245 8.5% -4.4%
Black 14,470 6.6% 13,845 8.0% -4.3%
Total population by citizenship
Citizenship other than Canadian 37,545 3.3% 35,770 4.0% -4.7%
Population by Immigrant Status
Immigrants 121,255 1.3% 117,870 1.6% -2.8%
Total population by Aboriginal and non-Aboriginal identity
Total Aboriginal identity population 68,385 2.0% 63,845 2.5% -6.6%
Registered Indian status
Registered Indian 26,610 2.4% 23,225 3.0% -12.7%
Mobility 1 year
Moved 91,060 2.2% 85,395 2.8% -6.2%
Not moved 594,975 0.3% 600,640 0.3% 1.0%

2006 Census (long-form) compared to 2006 simulated NHS – Bathurst

Estimated total population: 30,750
Number of census respondents (long-form): 5,910
Estimated number of NHS respondents: 4,280

Table Summary

This tables shows selected quantitative indicators from the 2006 Census. It presents the estimates from the 2006 Census and simulated estimates for the National Household Survey along with the estimated bias. The column headings are: selected quantitative indicators, plus/minus percentage 2006 Census, simulated National Household Survey estimate, plus/minus percentage 2006 National Household Survey, and estimated bias in percentage. The rows are the selected quantitative indicators from the 2006 Census long form, grouped by various topics, along with their corresponding values.

2006 Census (long-form) compared to 2006 simulated NHS – Bathurst
Selected quantitative indicators 2006 Census Estimate +/- % 2006 Census Simulated NHS Estimate +/- % 2006 NHS Estimated Bias (%)
Total income in 2005 of population 15 years and over
Under $1,000 or Without Income 2,105 6.0% 1,995 7.4% -5.2%
$50,000 and over 3,805 6.7% 4,040 7.7% 6.2%
Total population 25 to 64 years by highest certificate, diploma or degree
High school or less 8,425 4.1% 8,110 5.0% -3.7%
College/Cegep 4,075 6.4% 4,205 7.5% 3.2%
University certificate, diploma or degree - Bachelor and above 2,360 8.7% 2,450 10.2% 3.8%
Total labour force 15 years and over 15,830 2.6% 15,625 3.2% -1.3%
Total labour force 15 years and over by industry
23 Construction 795 16.9% 805 20.1% 1.3%
91 Public administration 1,465 12.3% 1,535 14.4% 4.8%
Total labour force 15 years and over by occupation
A Management occupations 1,145 13.3% 975 17.3% -14.8%
B Business, finance and administration occupations 2,680 8.5% 2,695 10.1% 0.6%
D Health occupations 1,410 11.9% 1,610 13.3% 14.2%
G Sales and service occupations 4,310 6.5% 4,445 7.6% 3.1%
H Trades, transport and equipment operators and related occupations 2,635 8.5% 2,525 10.5% -4.2%
Total visible minority population 300 45.9% 170 73.2% -43.3%
Chinese 40 126.4% 10 302.5% -75.0%
Black 115 74.4% 60 123.4% -47.8%
Total population by citizenship
Citizenship other than Canadian 100 65.4% 130 68.6% 30.0%
Population by Immigrant Status
Immigrants 475 23.2% 475 27.8% 0.0%
Total population by Aboriginal and non-Aboriginal identity
Total Aboriginal identity population 440 26.2% 505 29.2% 14.8%
Registered Indian status
Registered Indian 185 28.7% 160 37.0% -13.5%
Mobility 1 year
Moved 3,235 12.1% 2,630 16.3% -18.7%
Not moved 27,695 1.2% 28,300 1.2% 2.2%

Footnote

Footnote 1

For the purposes of the simulation, the 2006 Census estimates have been assumed to be the 'true' population values. It should be noted, however, that the 2006 Census estimates are themselves subject both to sampling error and to response bias as they are based on a sample of 1 in 5 households.

Return to footnote 1 referrer

Date modified: