Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.
7. Evaluation of weighting procedures
Table of contents
As described in Chapter 4, the estimation process for the National Household Survey (NHS) involved the assignment of weights. Each household was first assigned a design weight that was determined by the sample design of the NHS. Some adjustments to these weights were then required to address total non-response. These adjusted weights, known as initial weights or pre-calibration weights, were adjusted further in the calibration process to produce final household weights. These final weights allowed for generally better agreement between the census counts and the NHS estimates for common variables between the two surveys. During calibration, the characteristics from Appendix C were used as constraints. Chapter 4 discussed how some constraints were dropped in order to get better census counts/NHS estimates agreement for other characteristics.
This chapter presents and evaluates certain aspects pertaining to the National Household Survey (NHS) weighting procedures. It examines the frequency that certain constraints were dropped during calibration as well as their effect on the NHS estimates. The chapter also examines the distribution of the weights and, for various characteristics, the discrepancies between census counts and sample estimates at various geographic levels.
7.1 Discarding constraints
The purpose of calibration is to adjust the household weights so that the resulting NHS estimates are as close as possible to the census counts for many common characteristics. Calibration also needs to make exact census counts/NHS estimate agreement for any mandatory constraints. See Appendix C for the list of constraints and Section 4.6 for more information on calibration. The addition of language variables for the 2011 Census means that there were many more constraints than in the 2006 Census.
Calibration was performed in 5,884 independently processed weighting areas (WA). In each WA, all 60 constraints entered calibration and were only dropped if necessary. The total persons and total households variables were the minimal set of constraints, meaning that they could not be dropped in any of the WAs. Other constraints were dropped or removed as required for the following reasons:
- No population ‒ If the constraint had no population in the WA, then the estimate must also be 0 for that constraint. This constraint does not contribute to the calibration process. We do not classify these constraints as being dropped, but rather as being ineligible for calibration.
- Small sample ‒ If the number of NHS respondents for a constraint in a given WA is more than 0 but less than 30, then using such constraints would reduce the accuracy of aggregated estimates. These constraints were therefore dropped.
- Linearly dependant and nearly linearly dependant ‒ If a constraint value can be calculated by the combination of other constraint values, then one of those constraints must be dropped because of linear dependence. For example, the value of the marital status constraints Married, Single, Separated, Divorced, and Widowed add to the TOTPERS constraint. At least one of these is not required and can be dropped. Age and language variables contribute to other groups of constraints that lead to linear dependence because they also add up to the TOTPERS constraint. The household size constraints also add up to the number of households' constraint, so at least one of these must be dropped.
If a constraint is dropped for having small sample (reason 2), its value may be determined by subtracting the combination of other constraint values not yet dropped. In this case, one of the remaining constraints also has to be dropped because of linear dependence. If another constraint is not dropped, it would be equivalent to retaining the small sample constraint dropped in reason (2). For example, suppose that the marital status constraint Widowed is dropped for having less than 30 respondents in a WA. However, this value of Widowed can be retrieved by subtracting the remaining constraints Married, Single, Separated, and Divorced from the TOTPERS constraint constraint. Consequently, an additional constraint needs to be removed. Summing the remaining constraints, Married, Single, Separated, and Divorced, would have approximately equalled the Total Number of Persons constraint.
Linear dependence is equivalent to having redundancy with selected constraints or with selected constraints and low population constraints. Near linear dependence is equivalent to having almost redundant constraints.
- Outlier ‒ If retaining a constraint causes a household weight to go outside the acceptable limit between 1 and 100, it is dropped so that it does not cause outlier weights.
Except for the first case where there is no population, each time that a constraint is dropped, the calibration process does not attempt to make census counts/NHS estimates agree for that constraint in that WA. A constraint dropped frequently will usually have a larger census count/NHS estimate difference than a constraint dropped much less often. This will be apparent if Table 7.1.1 and Table 7.3.1 are compared.
Table 7.1.1 lists all the WA-level constraints as well as the number of times that the constraint was dropped for each reason. The situations above are listed respectively as the columns No population, Small sample, Linearly dependant, Nearly linearly dependant and Outlier.
Frequency of discarding WA-level constraints in 2011 in final weight adjustment
|Variables/constraints||No population||Small sample||Linearly dependant||Nearly linearly dependant||Outlier|
|Source: Statistics Canada, 2011 National Household Survey.|
Certain unofficial languages tend to be found in certain parts of the country and not in others, so many WAs will have little or no population of a given language. As a result, the constraints for unofficial languages will often have census counts and estimates equal to 0 or the constraints will be dropped for having a small number of respondents with that characteristic. Similarly, some constraints involving French or English have little or no population in certain regions of the country. Consequently, constraints such as FR_EN_BI, FR_FR_BI, and EN_EN_EN, were often dropped.
In general, there was no major imbalance with the age constraints being dropped, with perhaps the exception of the 75 and over constraint. All age groups were regularly controlled upon, meaning that the census count/NHS estimate differences were not too extreme.
There were a few other constraints that were dropped frequently. The number of people aged 15 and over was almost always dropped for being linearly dependent or nearly linearly dependent. This is because the number of persons aged 15 and over can be determined by the age constraints, AGE19, AGE24, AGE29 up to AGE75PL. Finally, the constraints representing people in economic families or having children were frequently dropped because the constraints representing the number of people not in an economic family or not in a family with children caused them to be nearly linearly dependent.
The actual differences between the census counts and the estimates will be examined in Section 7.3.
Table 7.1.2 shows the number of times that each reason for dropping or removing a WA-level constraint occurred. The total number of constraints dropped is the sum of the Small sample, the Linearly dependant, the Nearly linearly dependant and the Outlier categories. As mentioned earlier, the No population category is not included in the total because it does not actually represent dropped constraints. The average number of constraints dropped per WA is simply the total for the category divided by 5,884, the number of WAs.
Summary statistics on discarding WA-level constraints in 2011 in the final weight adjustment
|Constraints||No population||Small sample||Linearly dependant||Nearly linearly dependant||Outlier||Total|
|Source: Statistics Canada, 2011 National Household Survey.|
|Total constraints dropped (WA level)||15,530||69,676||10,283||60,088||3||140,050|
|Average number of constraints dropped per WA||2.6||11.8||1.7||10.2||0.0||23.8|
7.2 Distribution of weights
Figure 7.2.1 shows the distribution of weights prior to calibration (initial weights) and the weights after calibration (final weights). Weights were grouped into intervals of size 1 for the lower weights, intervals of size 5 for weights between 5 and 20, and intervals of size 10 for the less frequent weights higher than 20. The figure shows the percentage of time that weights within each range occur. It can be seen that initial weights, or pre-calibration weights, were between 3 and 4 for approximately 80% of the households. This is due to the NHS design in which approximately 1 in 3 households received an NHS questionnaire in most areas. Initial weights between 1 and 2 occurred about 6% of the time, as did the range 4 to 5. The remaining 8% of the initial weights were distributed in categories between 2 and 3 and greater than 5. The mean initial weight was 5.0 and the median initial weight was 3.6.
The effect that calibration adjustments had on the weights can also be seen in Figure 7.2.1. A very noticeable difference is that the percentage of households that had initial weights between 3 and 4 was significantly reduced from 80% to approximately 22%. Furthermore, the final weights were more evenly distributed within the categories between 1 and 10. The percentage of households with initial weights greater than 10 did not change significantly however. The mean for final weights was again 5.0, but the median final weight increased to 4.0.
The changes between the initial weights and the final weights can be observed in Table 7.2.1. This table shows where the changes between categories occurred. For example, it can be observed that the most stable range was 1 to 2, where 93.4% of the households with an initial weight between 1 and 2 stayed in that category after calibration. The second most stable category was 5 to 10 where 53.7% of households with initial weights between 5 and 10 stayed in that category. This stability is likely attributed to the fact that this category is wider than all the ones below it.
Approximately 55% of the households with initial weights between 2 and 3 had their weight dropped between 1 and 2. However, this was not the case for the households that were initially between 3 and 4; 26.6% of them went into a lower weight group, 47.9% of them went into a higher group, and only 25.5% of them retained a weight between 3 and 4. Some households with very high initial weights had their weights reduced. Categories with weights greater than 20 saw reductions in the number of households after calibration.
Distribution of initial weights and final weights
|Initial weights||Final Weights|
Note: The "[" symbol means the number is included in the interval and the ")" symbol means it is not included in the interval.
Source: Statistics Canada, 2011 National Household Survey.
7.3 Discrepancies between census counts and NHS estimates – Canada
Chapter 4 describes the methods used to calculate the final household weights and Section 7.2 shows some of the relationships between the initial and final weights. Calibration reduced or eliminated discrepancies between the census counts and the corresponding NHS estimates for the 58 constraints at the WA level (see Appendix C). Some discrepancies remain, however, since constraints are sometimes discarded (see Section 7.1). The census count/NHS estimate discrepancy is defined as
The numerator in the above expression (NHS estimate – census count) is referred to as the 'census count/NHS estimate difference'. By dividing this value by the census count, the census count/NHS estimate difference relative to the size of the census count can be seen. In other words, the ratio represents the percentage that the characteristic was overestimated (a positive value) or underestimated (a negative value).
Table 7.3.1 shows the 2011 Canada-level census count/NHS estimate differences for the 58 WA-level constraints for both the initial weights and the final weights. The characteristic FEMALE is not one of the 58 WA-level constraints listed in Appendix C because it can be calculated by the difference of total persons and males. However, it has been included in this table for interest purposes. It can be seen that the sum of the final weight difference for males and females equals 0. It should also be mentioned that the census count for the total persons characteristic (TOTPERS) is less than the published figure of the 2011 Census (33,476,688). There are two reasons for this difference. First, the weighting process only utilized private households. Collective dwellings were not in-scope for the NHS and were excluded from the NHS. However, they were used for the census counts. Second, the cases mentioned in Section 4.2 (the five census subdivisions (CSDs) corresponding to five Indian reserves that were excluded from the universe because of a very low response rate in the NHS) were not part of the weighting process but were part of the published figure.
It is not enough to simply observe the difference between the NHS estimate and the census count. It is better to consider the difference relative to the size of the census count. Therefore, Table 7.3.1 shows discrepancies based on initial weights and final weights. Most cases in Table 7.3.1 had a discrepancy between -1.00% and 1.00%.
The census count/NHS estimate difference for initial weights tended to be greater than the census count/NHS estimate difference for the final weights. This shows the importance of the calibration process. As mentioned in Section 7.1, a difference between the census count and NHS estimate could occur in a WA for a characteristic if its constraint is dropped during calibration. In other words, the process did not control on any dropped constraint for a given WA. If the constraint is dropped in many WAs, these differences could partially cancel, or they could add up to create a large difference at the Canada level. Total persons (TOTPERS) and total households (TOTHHLDS) were the only mandatory constraints for which census counts/NHS estimates agreement had to be guaranteed for all WAs, so the final weight difference and discrepancy for these were 0. However, all other constraints had to be dropped for some WAs.
Section 7.1 pointed out the constraints that were dropped frequently and where high differences or discrepancies might lie. The effect of dropping those constraints can be seen in Table 7.3.1. The person constraints Total persons aged 15 and over, Single, Married, Number of children, Couple, being in an Economic family, and being in a family with a child were almost always dropped. Of these, the Married and Couple constraints had quite large differences (55,107 and 54,021 respectively). However, because the census counts were so high, the discrepancies for these constraints were not too high (0.43% and 0.34% respectively). On the other hand, the number of separated people had a discrepancy of -3.35%. This relatively small constraint was dropped 4,192 times and created an underestimate of -27,154.
The largest differences and discrepancies were typically found in some of the language variables. Since many languages are not found in high numbers in many parts of the country, these variables were often dropped during calibration, which led to some differences in the census counts and NHS estimates. Furthermore, because many of them have relatively low census counts, the differences are magnified, resulting in greater discrepancies. The largest difference was in the EN_EN_EN constraint where people reported speaking English as their mother tongue, home language, and official language. The overestimate of 166,801 was by far the largest difference for any variable. However, because this was the most common language scenario, the discrepancy was 0.99%, meaning that the characteristic was overestimated by about 1%. The largest discrepancy belonged to the FR_EN_BI characteristic in which people reported having French as the mother tongue, English as the home language, and able to speak both English and French. This constraint was frequently dropped for having at least one respondent in the WA, but less than 30 respondents. As a result, this characteristic, which had a census count around 351,000, had an overestimate of nearly 41,000 and a resulting discrepancy of 11.68%. Other language constraints with high discrepancies were African languages (8.49%) and Filipino languages (5.53%).
Most other person-level characteristics had lower discrepancies, particularly the age and gender characteristics which all had estimates within 0.2% of their census count. Only the group aged 75 and over with a discrepancy of 0.28% exceed 0.2%. This was also the age group with the greatest census count/NHS estimate difference (-5,365).
Household variables had mixed results. Households of size 1, 2, and 6 or more were all underestimated. The households with at least 6 people had the greatest underestimate (-14,493), which resulted in the large discrepancy of -3.69%. On the other hand, the household of size 4 had the largest difference (24,124) and a resulting discrepancy of 1.27%.
The closest estimates for a non-mandatory constraint belonged to the Age 5 to 9 group and the Korean language group with differences of 69 and 72 respectively. The small Korean language group had a discrepancy of 0.05%. The smallest discrepancies belonged to the Single detached dwellings (0.00%), Age 5 to 9 group (0.00%), the Age 50 to 54 group (0.01%), and the economic family constraint (-0.02%). Note that these discrepancies have been rounded to the nearest 0.01.
|Characteristic||Census||Initial weights, NHS||Final weights, NHS|
* The characteristic FEMALE is not one of the 58 WA-level constraints listed in Appendix C.
Sources: Statistics Canada, 2011 Census and 2011 National Household Survey.
|FEMALENote *The characteristic FEMALE is not one of the 58 weighting area-level constraints listed in Appendix C.||16,700,762||16,626,898||-73,864||16,689,210||-11,552||-0.07|