Sampling and Weighting Technical Report, Census of Population, 2021
4. Estimation from the census long-form sample

Any sampling process requires an associated estimation procedure for scaling sample data up to the population level and for ensuring that survey estimates are representative of the population. The choice of an estimation procedure is generally governed by both operational and theoretical constraints. From the operational viewpoint, the procedure must be feasible within the processing system of which it is a part, and from the theoretical viewpoint, the procedure should minimize the statistical error of the estimates it produces.

The estimation procedure produces a set of weights, and the weight for each sample unit corresponds to the number of units in the population that the sample unit represents. These weights are applied to the sample data to produce millions of estimates from the census long-form sample. Estimates are summary measures such as totals, averages, proportions and medians calculated from the sample for various characteristics of interest.

4.1 Considerations in the choice of an estimation procedure

4.1.1 Operational considerations

Mathematically, an estimation procedure can be described by an algebraic formula, or estimator, that shows how the estimate for the population is calculated as a function of the observed sample values and other information from the sample design or external data sources. Most of the time, this estimator is a simple function of weights and of the variable of interest for the responding units. Using a unique set of weights to produce all estimates guarantees a certain level of consistency among the different estimates of the survey.

Therefore, the approach taken for the census long-form sample (and in most sample surveys) was to split the estimation procedure into two steps: (a) the calculation of weights (known as the weighting procedure) and (b) the use of weights to produce estimates, such as the estimation of a particular population count by summing the weights of those persons or households with the characteristic of interest. Most of the mathematical complexity is contained in step (a), which is performed just once. Meanwhile, step (b) is reduced to a simple process, such as summing weights whenever tabulation is required. Since the weight attached to each sample unit is the same for any tabulation involving that unit, consistency between different estimates based on sample data is assured.

4.1.2 Theoretical considerations

For a given sample design and a given estimation procedure, one can, from sampling theory, make a statement about the chances that a certain interval will contain the unknown population value being estimated. A primary criterion in the choice of an estimation procedure is the minimization of the width of such intervals for a given level of confidence so that these statements about the unknown population values are as precise as possible. A common measure of precision for comparing estimation procedures is known as the standard error. Provided that certain conditions are met, intervals of plus or minus two standard errors from the estimate will contain the true population value for approximately 95% of all possible samples. Chapter 7 details the conditions and methods to compute confidence intervals for the census long-form.

As well as minimizing standard error, a second objective in the choice of an estimation procedure for the long-form sample is to ensure, as far as possible, that sample estimates for census characteristics are consistent with the corresponding known census values. Fortunately, these two objectives are usually complementary in the sense that sampling error tends to be reduced by ensuring that sample estimates for certain basic characteristics are consistent with the corresponding population figures. However, while this is true in general, forcing long-form sample estimates for census characteristics to be consistent with corresponding census figures for very small subgroups can have a detrimental effect on the standard error of estimates for the sample characteristics themselves. For example, if in several dissemination areas only a few subjects have a given characteristic, such as birth in a certain country, ensuring consistency between the sample estimates and the census counts for that place of birth would unduly increase the standard error for the rest of the characteristics.

In cases where no information about the population being sampled is available other than that collected for sample units and unit non‑response has not occurred, the estimation procedure would be restricted to weighting the sample units inversely to their probability of selection. For example, if a unit had a one-in-four chance of selection, then that selected unit would receive a weight of 4. When unit non‑response is observed, the weight must be further adjusted according to the estimated probability of response of the unit, for example. In practice, some supplementary knowledge about the population (e.g., its total size and possibly its breakdown by a certain variable—perhaps by province and territory) is often available. Such information can be used to improve the estimation formula so as to produce estimates with a greater chance of being close to the unknown population value. In the case of the census long-form sample, a large amount of very detailed information about the population being sampled is available from the census short-form data at every geographic level. This wealth of population information is used in the coverage, non‑response and calibration adjustments to improve the estimates made from the long-form sample.

Nevertheless, the long-form sample estimates for census characteristics cannot be made consistent with all the census counts at every geographic level. Differences between sample estimates and census counts become visible when a cross-tabulation of a sample variable and the corresponding census variable is produced. The tabulation of sample-based estimates of totals for particular characteristics will not necessarily agree with the equivalent census count tabulations for those characteristics.

Adjusting the weights by the most minimal amounts possible to achieve perfect agreement between long-form estimates and census counts for certain characteristics and subgroups is known as “calibration.”

4.2 Weighting areas

The various adjustments to design weights were made independently by weighting area. The geographic areas used for this purpose were aggregate dissemination areas (ADAs) and super aggregate dissemination areas (SADAs). ADAs were first introduced with the 2016 Census. SADAs were created specifically for the weighting procedures by ADA aggregation.

4.2.1 Aggregate dissemination areas

In total, for the 2021 Census, Canada was divided into 5,433 ADAs. Households were selected for the long-form sample in 5,191 ADAs. Of the 242 ADAs without sampled households, 237 consisted solely of out-of-scope households. The other five ADAs had only a handful of in-scope households, and none of them were selected.

The 2021 ADAs were constructed by making minimal changes to the 2016 ADAs to accommodate for changes at the dissemination area (DA) level. The goal was to allow for historical comparability in ADAs. Because criteria related to size are most relevant to the weighting process, the 2016 ADA delineation criteria are presented below.

ADAs satisfy the following delineation criteria:

  1. ADAs cover the entire country and, where possible, have a population count of 5,000 to 15,000 (based on the population counts from the previous census).
  2. ADAs respect provincial and territorial borders, as well as the boundaries of census divisions (CDs), census metropolitan areas (CMAs) and census agglomerations (CAs) subdivided into census tracts (CTs) in effect for the 2016 Census.
  3. ADAs are based on one of three 2016 Census dissemination geographic areas: DAs, census subdivisions (CSDs) or census tracts (CTs):
    • Within CMAs and CAs with CTs, adjacent CTs are combined to meet the ADA population criterion.
    • In areas without CTs (areas outside CMAs and the largest CAs) where CSDs have a population of fewer than 15,000, adjacent CSDs are combined to meet the ADA population criterion.
    • In areas without CTs where CSDs have a population of over 15,000, adjacent DAs are combined within these CSDs to meet the ADA population criterion.
  4. Every CSD that consists of an Indian reserve and a small number of other areas where the canvasser method is required constitute distinct ADAs.

“For more information about aggregate dissemination areas, refer to the Dictionary, Census of Population, 2021, Catalogue no. 98-301-X.”

Table 4.2.1.1 shows the degree to which ADAs with households in the long-form sample were properly adjusted to CSDs. The first scenario occurred in most cases, since ADAs were designed above all to respect the boundaries of CTs and CSDs. Scenario 4 is the only one where CSD boundaries were not respected. CTs were not included in the table because they were all in the first scenario except one, which was in scenario 3.

Table 4.2.1.1
Number of census subdivisions within the boundaries of aggregate dissemination areas with households in the long-form sample, 2021 Census
Table summary
This table displays the results of Number of census subdivisions within the boundaries of aggregate dissemination areas with households in the long-form sample. The information is grouped by Scenario (appearing as row headers), Description and Census subdivision, calculated using number and percent units of measure (appearing as column headers).
Scenario Description Census subdivision
number percent
1 The CSD was small enough to be fully contained in an ADA, and this ADA only had complete CSDs. No CSDs in the ADA were part of another ADA. 4,526 93.26
2 The CSD was small enough to be fully contained in an ADA, but another CSD in the same ADA was part of a different ADA. 39 0.80
3 The CSD was large enough to contain full ADAs. No ADAs were part of another CSD. 262 5.40
4 The CSD was part of two or more ADAs. 26 0.54
Total 4,853 100.00

Table 4.2.1.2 shows the distribution of ADAs with households in the long-form sample by province or territory.

Table 4.2.1.2
Number of aggregate dissemination areas with households in the long-form sample, by province or territory
Table summary
This table displays the results of Number of aggregate dissemination areas with households in the long-form sample. The information is grouped by Region (appearing as row headers), Number of ADAs (appearing as column headers).
Region Number of ADAs
Newfoundland and Labrador 83
Prince Edward Island 23
Nova Scotia 148
New Brunswick 129
Quebec 1,144
Ontario 1,659
Manitoba 222
Saskatchewan 263
Alberta 515
British Columbia 912
Yukon 29
Northwest Territories 38
Nunavut 26
Canada 5,191

Table 4.2.1.3 shows the number of ADAs by the number of in-scope households in the census. The majority of ADAs with households in the long-form sample had from 2,000 to 4,999 households. A considerable number of ADAs had small populations.

Table 4.2.1.3
Distribution of aggregate dissemination areas with households in the long‑form sample, by number of in‑scope households
Table summary
This table displays the results of Distribution of aggregate dissemination areas with households in the long-form sample. The information is grouped by In-scope households (appearing as row headers), Number of ADAs and Percent (appearing as column headers).
In-scope households Number of ADAs Percent
0 to 499 996 19.19
500 to 999 118 2.27
1,000 to 1,999 359 6.92
2,000 to 2,999 1,190 22.92
3,000 to 3,999 1,189 22.91
4,000 to 4,999 733 14.12
5,000 to 5,999 356 6.86
6,000 to 6,999 143 2.75
7,000 to 7,999 46 0.89
8,000 to 8,999 29 0.56
9,000 to 9,999 13 0.25
10,000 and over 19 0.37
Total 5,191 100.00

Table 4.2.1.4 presents the number of ADAs by range of numbers of households that responded to the 2021 Census long-form questionnaire. For the ADAs with the fewest respondents, a specific type of processing was applied to have enough households for weighting purposes (see Section 4.5).

Table 4.2.1.4
Distribution of aggregate dissemination areas with households in the long‑form sample, by number of respondent households for the long‑form questionnaire
Table summary
This table displays the results of Distribution of aggregate dissemination areas with households in the long-form sample. The information is grouped by Number of respondents (appearing as row headers), Number of ADAs and Percent (appearing as column headers).
Number of respondents Number of ADAs Percent
0 to 99 690 13.29
100 to 199 276 5.32
200 to 299 132 2.54
300 to 399 128 2.47
400 to 499 272 5.24
500 to 599 478 9.21
600 to 699 559 10.77
700 to 799 583 11.23
800 to 899 499 9.61
900 to 999 411 7.92
1,000 to 1,099 322 6.20
1,100 to 1,199 246 4.74
1,200 to 1,299 189 3.64
1,300 to 1,399 128 2.47
1,400 to 1,499 98 1.89
1,500 and over 180 3.47
Total 5,191 100.00

4.2.2 Super aggregate dissemination areas

SADAs were created specifically for weighting 2016 Census data, so that certain weighting procedures for which a large number of observations is desirable could be conducted.

The 2021 SADAs were constructed by making minimal changes to the 2016 SADAs to accommodate for changes at the ADA level. Since criteria on size are of particular interest for the weighting process, the 2016 SADA delineation criteria are presented below.

SADAs were created according to the following rules (in order of priority):

  1. SADAs are created by combining ADAs (mandatory).
  2. SADAs respect provincial and territorial borders (mandatory).
  3. SADAs have a population of 50,000 to 150,000 persons (except for CDs with a population of 40,000 to 50,000 persons that constitute their own SADA) excluding persons living in canvasser collection units (CUs).
  4. SADAs respect the boundaries of CDs.
  5. SADAs respect the boundaries of CMAs and CAs.
  6. SADAs respect the boundaries of CSDs.
  7. SADAs are single contiguous entities.
  8. SADA are as compact as possible.

The first two rules were mandatory, and rules 3 to 9 were followed where possible. A total of 409 SADAs were created.

Table 4.2.2.1 shows the distribution of SADAs by province or territory.

Table 4.2.2.1
Number of super aggregate dissemination areas, by province or territory
Table summary
This table displays the results of Number of super aggregate dissemination areas. The information is grouped by Region (appearing as row headers), Number of SADAs (appearing as column headers).
Region Number of SADAs
Newfoundland and Labrador 8
Prince Edward Island 2
Nova Scotia 13
New Brunswick 8
Quebec 97
Ontario 150
Manitoba 15
Saskatchewan 14
Alberta 44
British Columbia 55
Yukon 1
Northwest Territories 1
Nunavut 1
Total 409

Table 4.2.2.2 shows the degree to which SADAs were properly adjusted to CDs and CMAs. SADAs respected the boundaries of the majority of CDs (scenarios 1 and 3) and the boundaries of three-quarters of CMAs. The other CMAs were part of at least two SADAs (scenario 4).

Table 4.2.2.2
Number of census divisions and census metropolitan areas within super aggregate dissemination area boundaries, 2021 Census
Table summary
This table displays the results of Number of census divisions and census metropolitan areas within super aggregate dissemination area boundaries. The information is grouped by Scenario (appearing as row headers), Description, Census divisions and Census metropolitan areas, calculated using number and percent units of measure (appearing as column headers).
Scenario Description Census divisions Census metropolitan areas
number percent number percent
1 The CD or CMA was small enough to be fully contained within a SADA, and the SADA included only complete CDs or CMAs. No CDs or CMAs in the SADA were part of another SADA. 249 84.98 6 14.63
2 The CD or CMA was small enough to be fully contained within a SADA, but another CD or CMA in the same SADA was also part of another SADA. 2 0.68 0 0.00
3 The CD or CMA was large enough to contain complete SADAs. No SADAs were also part of another CD or CMA. 40 13.65 26 63.41
4 The CD or CMA was part of two or more SADAs. 2 0.68 9 21.95
Total 293 100.00 41 100.00

Table 4.2.2.3 shows the number of SADAs by the number of in-scope persons.

Table 4.2.2.3
Distribution of super aggregate dissemination areas with households in the long-form sample, by number of in‑scope individuals
Table summary
This table displays the results of Distribution of super aggregate dissemination areas with households in the long-form sample. The information is grouped by In-scope individuals (appearing as row headers), Number of SADAs and Percent (appearing as column headers).
In-scope individuals Number of SADAs Percent
30,000 to 39,999 3 0.73
40,000 to 49,999 20 4.89
50,000 to 59,999 23 5.62
60,000 to 69,999 29 7.09
70,000 to 79,999 101 24.69
80,000 to 89,999 66 16.14
90,000 to 99,999 46 11.25
100,000 to 149,999 114 27.87
150,000 and over 7 1.71
Total 409 100.00

4.3 Design weights

The design weight for each household in the long-form sample was calculated differently, depending on the census delivery method of the area where the corresponding dwelling was located.

Households living in private dwellings attached to collective dwellings were an exception to the rule. As mentioned in Section 2.2, all of these households were included in the sample. They were considered take-all, so their design weight was 1.

4.3.1 Weights for households counted in the sample

Sampled households with a design weight of 1 did not have their weight adjusted. These households kept their weight of 1 after the weighting procedures were completed (coverage and non‑response, as well as calibration to census counts). They either were located in canvasser CUs or were private households that were attached to a collective dwelling.

Total non‑response and partial non‑response for these households were addressed by imputation. Once the missing data were imputed, these households were considered to be respondents for estimation purposes (although they were considered to be non‑respondents for the calculation of response rates in Section 3.11).

4.4 Coverage and total non‑response adjustment

While there are several ways of treating non‑response in surveys, they can be divided into two main categories: imputation and reweighting. The former is usually applied for the treatment of items missing values and the latter for the treatment of total non‑response. A household was considered to be a respondent to the long-form questionnaire when it answered at least one of the long-form questions. With the high response rate to the long-form questionnaire, any non‑response adjustment method would have had, for the most part, only a modest impact on the final survey weights and estimates. Coverage and total non‑response for households in CUs in First Nations communities, Métis settlements, Inuit regions and other remote areas were compensated for with imputation procedures and, for the most part, with whole household imputation (WHI) as described in Section 3.6. In the rest of the country, reweighting procedures were used. The rest of this chapter describes those weighting procedures.

The main purpose of coverage and non‑response adjustments is to minimize the impact of any potential biases from lack of complete coverage (or from duplicates) and from unit non‑response. For the adjustment to actually reduce the potential bias, a rich set of information about the non‑respondents is very useful. Otherwise, the non‑response adjustment that can be applied is limited, and the potential bias will not be greatly lessened. Only geographical information was known for every non‑responding household. The information on non‑respondents was therefore somewhat limited. Fortunately, before the coverage and non‑response adjustments, the process of WHI occurred. An important part of WHI is to impute the short-form characteristics for all non‑respondents to the short form. This included long-form sample non‑respondents who did not answer any short-form questions. This additional information served as the basis for the long‑form sample non‑response adjustment.

The method used to adjust for coverage and total non‑response in the long-form sample was a reweighting calibration-based procedure applied to the design weights. The procedure can be divided into the following main steps:

  1. selection of calibration constraints for steps 2 and 3
  2. non‑linear calibration coverage adjustment
  3. estimation of a non‑response propensity based on non‑linear calibration for non‑response
  4. application of a score method based on the propensity of step 3.

Steps 1 to 4 were applied independently in each SADA. In other words, the non‑response adjustment was applied by SADA. See Section 4.2 for the definition and information about ADAs and SADAs.

The first step consisted of a forward selection of calibration constraints in the SADA. It was performed as follows:

The selection process excluded constraints that occurred in fewer than 250 households in the SADA and constraints that were redundant or almost redundant in terms of collinearity with those constraints or with constraints already selected. Constraints that were redundant with constraints already selected were excluded since they did not add any new information. Given those filters, the order of priority used in the evaluation of constraints ensured that the constraints selected complemented each other and corrected for any potential coverage differential between the long-form and the short-form, as well as for census total non‑response.

The second step applied a coverage non‑linear calibration adjustment to the whole sample in the SADA (i.e., respondents and non‑respondents). The long-form sample weighted counts, for the constraints selected in the first step, were made to coincide with the corresponding population counts. The purpose of this step was to correct for any potential coverage differential between the long-form sample and its complement (i.e., the set of households receiving only the short form). One way in which overcoverage can occur is if some individuals are counted in two different households. The coverage for the two populations could also be different if, for example, occupied dwellings were more likely to be incorrectly treated as unoccupied dwellings for the long-form than for the short-form. Another objective of this step was to isolate as much as possible the sampling error. Without this step, the non‑response calibration carried out in the next step would confound the non‑response error with the sampling error. This step makes the sample estimates coincide with the population estimates. In addition, the same control totals are used in both calibration procedures. As a result, the non‑response propensity estimation done next does not have to correct (directly or indirectly) for the sampling error. Combining a correction for the sampling error and for the non‑response error in the next step would have been inappropriate. The calibration procedure would have failed if the weight of any respondent was required to decrease to match the census counts, because the estimated propensity would have been greater than 1. Moreover, the score method applied in the last step required an estimate of the response propensity alone. To the extent that the variable of interest was related to the selected constraints, the sampling variance was also reduced by this step.

After these two steps, the main non‑response adjustment took place. The weights, adjusted in the previous step, of non‑respondents were set to 0 and the weights of respondents were increased so that the weighted sums in the SADA coincided with the corresponding population counts for the selected constraints. A logistic link function between the response propensity and the characteristics used in calibration enabled the implicit estimation of the response propensity. Folsom and Singh (2000) proposed this non‑linear calibration method as a way of adjusting for non‑response while ensuring both that the estimates coincided with selected population counts and that the estimated response probabilities were between 0 and 1. This last condition does not necessarily hold when linear calibration is used for non‑response adjustment. To the extent that the response propensity was related to the selected constraints, this step reduced the potential non‑response bias without increasing the variance.

The inverse of the estimated response probabilities obtained in the previous step could be directly used to adjust the weights for non‑response. However, the score method was used for the last step of the non‑response adjustment to smooth the estimated probabilities from the previous step. This further ensured the quality of the non‑response adjustment and avoided overly large adjustments. For each ADA, homogeneous weighting classes were formed according to the estimated response probabilities. In each class, the weighted harmonic mean of the response probabilities was calculated. The harmonic mean was used because it is less affected by outliers in the estimated response probabilities. The inverse of this mean was applied to the weights of respondents in the class as the non‑response adjustment. This is equivalent to applying the weighted arithmetic mean of the weight adjustment factors in each homogeneous weighting class, where the adjustment factors would be the inverse of the estimated response propensities.

In summary, the coverage and total non‑response adjustment was a product of two quantities: the coverage adjustment and the inverse of the score-method harmonic mean.

4.5 Final calibration

Final calibration is a linear calibration that was done to minimize the sampling variability of estimates derived from long-form questionnaire responses, while ensuring consistency between estimated totals and Census of Population totals. This weighting step was necessary, since ensuring consistency between estimated totals and Census of Population totals was important for a large number of variables and geographic areas, i.e., satisfying calibration constraints.

Only the weights for households in MO, L/L or MODO areas were calibrated, since these households were sampled. Exceptions to this rule were households in these areas that lived in a private dwelling attached to a collective dwelling. Since all these households were included in the long-form sample and all the long-form questionnaire responses for these households were imputed, no calibration was done. The final weights for these households were therefore equal to 1. The weights produced by the calibration process were the final weights used to calculate the long-form estimates, and these weights applied to households as well as families and persons. In other words, all families and persons from the same household received the household weight. For this final adjustment, the variability of the calibrated weights needed to be limited to avoid having an excessive portion of the weight applied to a single household or person. Therefore, weights were constrained to range from 1 to 20.

Calibration constraints were defined at the person, household and census family levels. Additionally, constraints can be selected at two different geographical levels, at the ADA or at the SADA level. These two levels maximize the overall consistency between estimated totals and Census of Population totals, while minimizing the number of calibration constraints. This helps to reduce the variability of estimates. Appendix C lists all the ADA and SADA constraints that were taken into consideration during the calibration process. Characteristics available from the census, administrative sources and the long-form questionnaire and for which consistency was attempted included age, gender, marital status, common-law status, household size, dwelling type, official language spoken, year of immigration and place of birth.

The constraints selection process is applied simultaneously to a SADA and its ADAs, but independently across SADAs. Calibration was then performed using all of the selected constraints. The 2021 calibration process saw the addition of three new constraints. These were the number of persons who live in an apartment in a building that has five or more storeys (APT5PLUS) and two constraints related to the number of persons who immigrated from 2016 to 2021 (YRIMD_2016 and YRIMG1_2016). Additionally, constraints previously based on the 2016 sex concept are now based on the 2021 two categories gender variable.Note 1 In total, 203 constraints were defined for SADAs and 271 for ADAs. Various factors drove the choice of geographic level for calibration constraints. This choice was made in collaboration with subject-matter experts. For example, some constraints were defined only for SADAs, since they would not have been populated enough at the ADA level. Other constraints, such as age groups, were chosen in a way that ensured they were not only populated enough but also not too similar when assessed by the selection process.

To facilitate their calibration, small ADAs were combined before the selection of calibration constraints to ensure a minimum of 60 long‑form respondent households per ADA. Small ADAs that fell entirely within a CSD were initially combined with other ADAs in the same SADA. Next, small ADAs in CDs were combined with other ADAs in the same SADA. Finally, the remaining small ADAs were combined with an ADA from an adjacent SADA. The ADA grouping procedure produced 4,207 groups of ADAs with 60 or more respondent households.

The first step in the process to select calibration constraints was to categorize each of the constraints into one of three groups:

Mandatory constraints: These constraints had to be used in the calibration because the census counts had to agree with the long-form estimates at the geographic levels that are usual aggregates of ADAs and SADAs (e.g., Canada, provinces and territories). The number of persons and the number of households in the ADAs and SADAs were the two mandatory constraints.

Low-response constraints: Constraints evaluated for a population of 200 or fewer households were not used in the calibration because they can make survey estimates unstable.

All other constraints: These constraints were examined further to see whether they should be used in the calibration.

The second step was to determine which constraints from the third group should be used in the calibration process, in addition to the mandatory constraints. The constraints from the third group were added one by one, by repeatedly choosing the constraint that divided the population of the SADA or ADA in two as evenly as possible. Constraints that were too linearly dependent were excluded. To avoid introducing a bias in the point estimates and to avoid increasing their variance, the number of selected constraints was limited. Evaluations determined that this number had to be smaller than the square root of the number of respondent households involved in the constraint.

After the calibration constraints were selected, a final edit was done to check whether the set of constraints chosen at the ADA and SADA levels was free of collinearity.

The calibration itself was then carried out for the final set of constraints from the second step. The weights adjusted for coverage and non‑response were modified as little as possible, so that the weighted estimates would be equal to census totals for these constraints. Statistics Canada’s Generalized Estimation System (GES) was used to carry out the calibration.

Sample estimates can differ from census counts for a few reasons, particularly for small areas, even after the calibration step. A few of these reasons are given below.

4.6 Details on the selection of constraints

Constraints were selected twice during the weighting process: first during the coverage and non‑response adjustment discussed in Section 4.4, and again during the final calibration discussed in Section 4.5. The variables making up the constraints were essentially the same, but the inclusion or exclusion of constraints varied slightly between the two weighting steps to better align with the objective of each step. This section explains how the constraints were selected during these weighting steps.

The constraint selection process, for both adjustments, started from a set of mandatory constraints detailed in the previous sections and then evaluated the addition of every other candidate constraint one by one. The order in which candidate constraints were evaluated was identical for all SADAs. When a constraint was introduced, the no population and small population criteria were evaluated and the constraint would be rejected if either criterion failed. If a constraint passed both criteria, the augmented set of constraints including it was then evaluated for linearly dependent, high collinearity and explanatory redundancy criteria. If it failed any of the criteria, the constraint was rejected. Otherwise, the constraint was added to the pool of constraints included and the selection process iterated to the next candidate constraint from the list. Table 4.6.1 summarizes those five criteria, whether they were applied for each of the two processes and differences in parameterization of the criteria between the two weight adjustment processes.

For each weight adjustment process, the constraint selection was carried out independently in each of the 408 SADAs that had sampled households with an adjusted weight.

See Appendix C for the list of constraints and a frequency distribution of their respective inclusion or exclusion for each of the two weighting process.

Table 4.6.1
Criteria applied in selecting coverage, non-response and final calibration adjustment constraints
Table summary
This table displays the results of Criteria applied in selecting coverage. The information is grouped by Criteria (appearing as row headers), Adjustment for coverage and non-response and Final calibration (appearing as column headers).
Criteria Adjustment for coverage and non‑response Final calibration
No population according to the census counts: If the constraint had no population in the weighting area, then the estimate after adjustment must also be 0 for that constraint. These constraints are not classified as excluded but rather as ineligible to the adjustment process. Applied at the SADA/ADA level. Applied at the SADA/ADA level.
Small population according to the census counts: If a constraint involves less than a certain number of households in the population of the weighting area, then it is considered small and the constraint is excluded. Including such a constraint would unduly increase the variance. However, constraints with small population can be implicitly calibrated and in this case are included in the total number of calibrated constraints. Applied at the SADA/ADA level. The number of households in the population is larger than 0 but less than 250 in the weighted area. Applied at the SADA/ADA level. The number of households in the population is more than 0 but less than 200 in the weighted area. 
Linearly dependent: If the value of a constraint can be calculated by combining the values of other constraints, one of these constraints is not necessary and must be deleted during the adjustment process because of its linear dependency. However, constraints that are excluded because of their linear dependency are implicitly calibrated. They are therefore included in the total number of calibrated constraints.  Applied at the SADA level. The selection of constraints can be compared to the selection of explanatory variables in a linear regression. The VIFTable 4.6.1
Criteria applied in selecting coverage, non-response and final calibration adjustment constraints Note 
1
and the condition numberTable 4.6.1
Criteria applied in selecting coverage, non-response and final calibration adjustment constraints Note 
2
 are thus used to detect high collinearity.
Applied at the SADA/ADA level. Two dependency checks are conducted to identify linearly dependent constraints. The first check is done when the constraints at the SADA/ADA level are selected, and the second check includes all the constraints chosen at both levels of the geographic hierarchy (SADAs and ADAs).
High collinearity: If a constraint value can be almost calculated by the combination of other constraint values, then at least one of those constraints must be avoided in the adjustment process. Such a constraint is not perfectly calibrated. Applied at the SADA level. The selection of constraints can be compared to the selection of explanatory variables in a linear regression. The VIFTable 4.6.1
Criteria applied in selecting coverage, non-response and final calibration adjustment constraints Note 
1
and the condition numberTable 4.6.1
Criteria applied in selecting coverage, non-response and final calibration adjustment constraints Note 
2
 are thus used to detect high collinearity.
Applied at the SADA/ADA level. Two linear dependency checks are conducted to identify constraints that are close to being linearly dependent. The first check is done when the constraints at the SADA level and the ADA level are selected, and the second check includes all the constraints chosen at both levels of the hierarchy simultaneously (SADAs and ADAs). 
Explanatory redundancy: If a constraint explains the non‑response (almost) as well as other constraints already selected, then the non‑response calibration procedure would fail. This is equivalent to saying that if a constraint does not add any information about the non‑response mechanism, beyond what is explained by the already‑selected constraints, then it should not be included. Applied at the SADA level. A sequential procedure is applied (a form of logistic regression) to test the convergence of the logistic regression. N/A

Date modified: