Guide to the Census of Population, 2016
Chapter 9 – Sampling and weighting for the long form
For the 2016 Census, Canadian households are counted using two main types of questionnaires: the short-form questionnaire and the long-form questionnaire. The long-form questionnaire includes the same questions as the short form, as well as a series of questions aimed at providing a comprehensive portrait of the Canadian population and Canadian households. The long-form questionnaire is sent to a sample of the population.
The estimates produced from the responses to questions found on both questionnaires are obtained from a census of population. As such, all respondent households for both types of questionnaires contribute to a given number. That is the case for the population count for a specific age group, for example.
The estimates produced from the responses to at least one question from the long form are obtained from a sample survey. In this case, only the respondent households from the long-form sample contribute to the estimate, such as the unemployment rate estimate or the estimate of the population by highest level of education.
Selecting the sample for the census long-form questionnaire
The long-form questionnaire sample is selected from small geographic areas that, together, cover the country, called collection units (CUs). The CUs determine the strata for the sample plan. There are five types of CUs: list/leave, mail-out, collective dwellings, Indian reserves and canvasser enumeration. For the two last types of CUs, enumerators conduct personal interviews. In each CU (or stratum), a list of dwellings is drawn up and a systematic sample of private dwellings is chosen, with a sampling fraction of one in four. Collective dwellings are excluded from this draw. There are exceptions with respect to the sampling fraction: all private dwellings in CUs where enumerators collect data are selected for the long-form sample. Households in private dwellings selected for the sample are asked to complete the census long form. Other households—i.e., those in the private dwellings that are not part of the long-form sample, as well as those in collective dwellings, which are excluded from sampling—are asked to fill out the short form.
The sample for the long-form questionnaire is divided equally among geographic areas to ensure estimates are reliable for all regions across the country and to give the same relative importance to all geographic units of a given size. The sampling fraction was increased in 2016, compared with one in five for the previous census long-form questionnaire in 2006. In 2011, the response rate for the National Household Survey (NHS), a voluntary survey, was lower than for the 2006 Census long-form questionnaire. For the 2016 Census long-form questionnaire, a sample of one in four households was selected to reduce the risk of lower participation than in the past.
Weighting the sample for the census long-form questionnaire
The final responses to the long-form questionnaire are weighted so that they represent the Canadian population living in private dwellings. Weighting is the process of grouping the sample weight calculation and various adjustments to that weight. These include a weighting adjustment for the coverage of occupied dwellings based on the results of the Dwelling Classification Survey (DCS), an adjustment to correct the total non-response of sampled households, and a calibration of the weights of respondent households to totals derived from the census.
First, each household is given a sample weight equivalent to the inverse of its probability of selection in the sample. In CUs where enumerators conducted personal interviews, this weight is 1. In other CUs, this weight is generally 4. It is higher than 4 and no greater than 7 in list/leave CUs with a number of dwellings that is not a multiple of four because of how the systematic sample in this type of CU is drawn. In these CUs, the systematic sample is not random and the sampled dwellings are those listed 4th, 8th, 12th, etc. For example, if one such CU has seven dwellings, the sampled dwelling, i.e., the fourth one listed, will have a sample weight of 7 to represent all dwellings in its CU.
In the sample selected for weighting, several types of responses to the long form can be differentiated. First, there are households that answered at least one question from the long form that was not in the short form. These households are defined as "respondent households" for the long form. Then, there is a fraction of households that answered only questions found on both the questionnaires or, similarly, only questions on the short form. Finally, there are certain households that did not respond to any questions. The last two types of households are referred to as "non-respondent households" for the long form.
In CUs where enumerators conducted personal interviews, i.e., Indian reserve CUs and canvasser enumeration CUs, non-response to the long-form questionnaire is accounted for by imputation. Data for households that did not respond to any questions are imputed using data from a respondent household in the same type of CU. Other non-responses are imputed for partial non-response. All private households in these CUs that are not part of incompletely enumerated Indian reserves and establishments keep their sample weight of 1 for estimation purposes. Other private households and collective households are attributed a final weight of nil and thus do not contribute to the estimates.
In the other types of CUs, several adjustments are made to the weight, and a different imputation method is used. The following describes the processing in these CUs. Only respondent households for the long-form questionnaire are assigned a non-null weight at the end of the weighting stages, meaning that they are the only ones to contribute to the long-form questionnaire estimates. Partial non-response for these households is compensated for by imputation.
Non-respondent households for the long-form questionnaire are nevertheless taken into consideration in the census figures. In fact, for all enumerated households that did not answer any questions, all responses are imputed for questions found on both questionnaires based on data from a household that answered at least one such question. The remaining non-responses to these questions for all enumerated households are imputed for partial non-response.
Before proceeding with the imputation for the census total non-response, the census undercoverage of occupied dwellings is estimated using the DCS, and this undercoverage is corrected by changing the occupancy status of certain dwellings. The incorrect classification of dwellings on Census Day is in fact one source of coverage error. This error can occur when an occupied dwelling is classified as unoccupied or when an unoccupied dwelling is classified as occupied. The purpose of the DCS is to estimate the number of these classification errors. To this end, a sample of private dwellings for which no census questionnaire was returned are contacted, and information is gathered on their occupancy status on Census Day and, if the dwelling was occupied, on the number of usual residents.
The weighting steps that follow the assignment of the sample weight are carried out after imputing for total non-response and for partial non-response to questions found on both questionnaires. All these weight adjustments are done by calibration. Calibration consists of applying the smallest adjustment possible to the weight so that the weighted estimates coincide with known counts. These known counts are referred to as control counts.
At each stage, the country is divided into geographic areas, and each area is calibrated independently. Four types of geographic units can be used, depending on the weighting stage: the dissemination area (DA), the aggregate dissemination area (ADA), the census subdivision (CSD) and the super aggregate dissemination areaNote 1 (SADA). DAs are small areas, consisting of one or more neighbouring dissemination blocks and including 400 to 700 people. ADAs are groups of adjoining DAs, most often including 5,000 to 15,000 people. ADAs respect provincial and territorial borders, as well as the boundaries of census divisions (CDs), census metropolitan areas (CMAs) and census agglomerations (CAs) subdivided into census tracts (CTs) in effect for the 2016 Census. CSDs are also groups of DAs that respect the boundaries of CDs. They correspond to municipalities or areas treated as municipal equivalents for statistical purposes. SADAs are groups of adjoining ADAs, most often including 50,000 to 150,000 people. SADAs respect provincial and territorial borders and, most of the time, the boundaries of CDs.
The unit of measure for control counts can be the household or the person. Some control counts are derived from responses to questions found on both questionnaires. They are related to geography, age, sex, marital or common-law status, dwelling type, size of household, family structure and knowledge of official languages. Other control counts are derived from administrative data matched to census records. These are counts derived from individual income tax data, immigration data and data from the Indian Register. However, for a given region, several control counts are eliminated based on certain criteria to maximize the general quality of estimates.
The first sample weight adjustment makes the coverage of the selected sample correspond to that of the census. In fact, the imputation for total non-response and for census undercoverage based on the DCS does not allow the type of questionnaire to be taken into account. This means that the sample coverage after imputation can differ from the census coverage. To make them correspond, the sample weight is calibrated for all households targeted for the long-form questionnaire in the sample, whether or not they responded. This adjustment is made independently by SADA. All control counts are derived at that level, except some counts of households and individuals in the ADAs that make up the SADAs. The weight of households that are not targeted for the long-form questionnaire is set to 0. After the adjustment, the control counts correspond to the weighted counts for the sample.
The weight (adjusted for coverage) of respondent households is then adjusted for non-response using a logistic regression model that predicts the likelihood of response. This is done at the SADA level using a calibration of weights of respondent households based on the model. The control counts are the same as for the first adjustment, and the model's prediction variables are the variables that correspond to these counts. The weight of non-respondent households is set to 0. As a result, the control counts correspond to the weighted counts of respondent households.
The final adjustment consists of calibrating the weight (adjusted for non-response) of respondent households to more control counts. This ensures a certain consistency with census counts and attempts to reduce the variability of long-form questionnaire estimates. Calibration is again done independently by SADA. For this adjustment, a greater number of counts is chosen at the ADA level, and household and person counts are chosen by cross-tabulating ADAs and CSDs.
The weighted estimates from the long-form questionnaire may differ from census counts for characteristics found in both. In particular, this is the case when looking at a geography with boundaries that do not correspond to ADAs and SADAs. Furthermore, the smaller the geographic area, the greater the likelihood that estimates from the long-form questionnaire will differ from the census counts. When there are differences, the 2016 Census figures should be considered of higher quality and users should prioritize them, as they are not affected by the sampling variance or the slightly higher non-response error of the long-form questionnaire. Estimates from the long-form questionnaire for characteristics found in both forms should be used as contextual information when analyzing data specific to this questionnaire.
A detailed technical guide to sampling and weighting for the long-form questionnaire will be available in 2018. It will give further details on the weighting and estimation process.