# 3. SamplingFootnote 1

## 3.1 The National Household Survey sample

The National Household Survey (NHS) is a sample survey designed to collect detailed demographic, social and economic information about the Canadian population. The sample was drawn from the 2011 Census of Population dwelling list. Only occupied private dwellings and their corresponding households were in-scope for the NHS. Thus, unlike the census, all collective dwellings and households outside Canada (e.g., diplomats, military personnel) were out of scope for the NHS. At the time that the NHS sample was selected, it was not always known which addresses were linked to out of scope dwellings, meaning that some out of scope dwellings erroneously received an NHS questionnaire. Once a dwelling was determined to be out of scope, no further collection or processing activities were carried out. The NHS questionnaire was distributed to about 30% of private households.Footnote 2

The proportion of dwellings to be selected for the NHS was determined at the collection unit (CU) level. The census collection method (e.g., mail-out, list/leave, canvasser) used for any given CU was also used to help determine its sampling fraction for the NHS.

All private dwellings in CUs whose census information was collected via canvasser were selected for the NHS. This included dwellings on Indian reserves. Canvasser areas included approximately 200,000 occupied private dwellings. In list/leave areas, private dwellings were sampled at a rate of one in three.

In mail-out areas, private dwellings were selected according to sampling fractions that were calculated at the provincial/territorial level. These fractions were derived in order to reach the desired fixed national sample size of 4.5 million dwellings, while obtaining equal provincial/territorial sampling fractions for the combined list/leave and mail-out area. Table 3.1.1 presents the different sampling fractions for the provinces and territories in which questionnaires were mailed out.Footnote 3

Newfoundland and Labrador 0.1973
Prince Edward Island 0.2091
Nova Scotia 0.2488
New Brunswick 0.2678
Quebec 0.2768
Ontario 0.2752
Manitoba 0.2640
Saskatchewan 0.2473
Alberta 0.2725
British Columbia 0.2772
Yukon 0.2621
Northwest Territories Note ...: not applicable
Nunavut Note ...: not applicable

In mail-out and list/leave areas where self-enumeration was used, sampled households were selected based on a stratified systematic random sampling design (stratified by province and territory and collection mode). In the list/leave CUs, one out of three dwellings was selected. Dwellings in the mail-out CUs were selected using provincial and teritorial sampling fractions that were calculated so as to (1) generate the desired total sample size at the Canada level, and (2) produce the same sampling fraction for each province and territory (mail-out and list/leave collection modes combined).

## 3.2 NHS subsample

It was determined prior to the collection of NHS data that the resources available for NHS non-response follow-up (NRFU) would not allow for follow-up on all non-respondents. It was therefore decided that as of a certain date, a subsample of the remaining non-responding cases would be selected. This would ensure that field staff would have a manageable number of cases with which to work, and it would reduce the risk of non-response biasFootnote 4 by allowing staff to focus on certain areas or dwellings. All cases that were not selected in the NRFU subsample would therefore be excluded from further collection activities. This approach is a form of sampling known as 'two-phase sampling'. The dwellings originally sampled for the NHS form the first phase sample; the subsampled dwellings for the NRFU form the second phase sample. Figure 3.2.1 illustrates the design. Rectangle U represents the dwellings of the census. The NHS sample is shown by the large ellipse sa, where sa1 represents the dwellings that responded to the NHS by July 14, 2011. The remaining portion of the oval, which represents dwellings that did not respond by that date, was then split into two parts. The subsample of non-respondents for NRFU is represented by the small oval ${s}_{2}$ and the non-respondents not selected for NRFU are in ${s}_{a2}$.

The two-phase sampling approach was proposed initially by Hansen and Hurwitz (1946) and has proven to be an efficient method to be used when the second phase collection cost per unit is higher. It is also effective when a more concentrated effort is used to target a smaller number of non-respondents for NRFU. The goal for the NHS was to put the maximum reasonable effort on fewer selected follow-up cases rather than diluting the effort by spreading the available resources over all non-response cases. At the same time, the NRFU sub-sampling methodology can take advantage of a combination of frame information and paradata in order to allocate the resources in an efficient manner, leading to a reduction in non-response bias.

The following cases were deemed ineligible for NRFU subsampling:

• Those that were identified as respondents at the time of subsampling.
• Those for which NRFU appointments already had been made because they were considered to have a relatively greater chance of becoming full responses in the near future.
• Those in areas using the canvasser method of enumeration.
• Those that were added to the NHS sample after the initial sample selection since no questionnaires had been sent.
• Cases corresponding to an unused Visitation Record line or dwellings that could be identified prior to subsampling as a dwelling with only temporary or foreign residents (TR/FR).

## 3.3 Targeted CUs

An advantage of creating a subsample for NRFU is that non-respondents living in certain regions or having certain characteristics that are less likely to respond to the NHS can be targeted to improve their response rate. Therefore, because these non-respondents typically have a lower propensity to respond, it was decided to oversample certain pre-determined CUs to ensure a good representation of specific populations in the NHS subsample. Table 3.3.1 lists five populations (target domains) that were oversampled because based on other surveys or past censuses, they were considered to be at risk of having a lower response rate and they are considered of high interest to potential statistical analysis.

Aboriginals (outside reserves) First Nations (North American Indian) single identity, Métis single identity, Inuk (Inuit) single identity, or any combination of the three Person
Recent immigrants Non-permanent residents and Canadians attaining immigration status between 1996 and 2006 Person
Visible minorities All non-White groups except for ChineseFootnote 1 and Aboriginal Person
Low degree of education Those in the labour force with no education certificate beyond a high school diploma Person
Low level of income Household income between $0Footnote 2 and$20,000 Household
The oversampling strategy identified target areas that contained a relatively high proportion of persons or households in one or more target domains. Unweighted data from the 2006 Census were used for the creation of the target domains. The dwellings in 2006 were matched with a CU from 2011, and only CUs containing mail-out or list/leave collection methods were considered for oversampling.

## 3.4 NRFU subsample selection

In order to determine the NRFU subsample, the country was divided into 23,901 strata. Any CU that was targeted for oversampling served as a stratum; otherwise, a stratum was a grouping of usually two adjacent CUs called enumerator zones. All strata were represented in the NRFU subsample. If a stratum had one or two eligible dwellings, those dwellings were automatically added to the subsample. Only 196 dwellings belonged to these strata. For strata with three or more eligible dwellings, the number of dwellings to be selected was calculated to be proportional to the size of the stratum. The subsample size for the strata that were flagged for oversampling was inflated by a factor of 1.6.

Once the adjusted stratum subsample sizes had been calculated, the NRFU subsample was selected on July 14, 2011 using systematic sampling with a fractional sampling interval. In all, 642,442 dwellings were selected for the NRFU subsample, which represented approximately 33.5% of remaining eligible NHS dwellings. Targeted CUs contributed 169,657 dwellings, which was 26.4% of the entire NRFU subsample. The subsample distribution by province and territory is shown in Table 3.4.1.

Newfoundland and Labrador 11,884
Prince Edward Island 3,160
Nova Scotia 19,832
New Brunswick 16,098
Quebec 144,767
Ontario 244,309
Manitoba 23,770
Saskatchewan 22,666
Alberta 66,889
British Columbia 88,311
Yukon 613
Northwest Territories 143
Total 642,442

