Statistics Canada
Symbol of the Government of Canada
Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

7.  Evaluation of weighting procedures

7.1  Weighting area (WA) formation

7.2  Evaluation of the census weighting methodology

7.2.1  Distribution of weights

7.2.2  Discrepancies between population counts and sample estimates

7.2.3  Discarding constraints

This chapter presents and evaluates certain aspects pertaining to census weighting procedures, such as weighting area formation and the size distribution of the weights. Also, it examines, for various characteristics, the discrepancies between population counts and sample estimates at the Canada level. It also discusses Pass 1 versus Pass 2 results and the different data universes for which census data may be presented. Finally, it takes a look at the frequency that constraints are discarded and the effect this has on these discrepancies.

7.1  Weighting area (WA) formation

In the 2006 Census, the country was partitioned into 6,607 WAs containing, on average, approximately eight whole DAs. The weighting program attempts to achieve agreement between certain sample estimates and the corresponding population counts for each WA. A WA was formed by grouping together DAs to adhere to the following conditions:

  • A WA must respect the boundaries of census divisions (CDs).
  • A WA should contain a population of between 1,000 and 3,000 households.
  • A WA should, where possible, respect (in order of priority) census subdivision (CSD) boundaries and census tract (CT) boundaries.
  • A WA should, where possible, be made up of contiguous DAs (i.e., not be in two or more parts or contain any 'holes') and it should be as compact as possible.

Table 7.1.1 shows that 6,559 (99.3%) of the 2006 WAs are within the desired range of 1,000 to 3,000 households in the 2006 Census. This is considerably better than in 2001 when only 94.2% of WAs were within the range. The algorithm that was used to generate WAs in 2006 was the same as in 2001, so the automated results were similar. However, the improvement is due to many more manual adjustments being made at the end of the process in 2006 than were made in 2001. Many of the abnormal WAs were either split, amalgamated, or realigned to better follow the conditions mentioned above.

The average number of dwellings per WA was 1,869. The largest WA contained 4,820 dwellings, an improvement from 2001 when the largest WA contained 17,043 dwellings. In 2006, there were five WAs with zero population. In these cases, the WAs contained DAs that were not subject to sampling. These WAs with zero population are in Labrador, the Northwest Territories, and Nunavut.

Agreement between sample estimates and population counts is ensured only for geographic areas which are made up of whole WAs. These areas include provinces and CDs, as well as CSDs and CTs in which no WA within them makes up part of another CSD or CT. Table 7.1.2 looks at the relationship between 2006 Census CSD and CT boundaries and WA boundaries. There are four mutually exclusive scenarios possible:

  1. 'Geographic areas containing only part of one WA while the rest of the WA contains only complete geographic areas of the same kind' – This means that the CSD or CT was small enough to fit entirely within a WA, and that the same WA only consisted of whole CSDs or CTs. None of the CSDs or CTs in that WA crossed into a different WA. Therefore condition (c) was satisfied. This scenario occurs frequently for CSDs because there are many very small municipalities such as reservations and villages that contribute little or no population that is subject to sampling.

  2. 'Geographic areas containing only part of one WA while the rest of the WA does not contain only complete geographic areas of the same kind' – This means that the CSD or CT was small enough to fit entirely with a WA, but a different CSD or CT within that same WA was shared by a different WA. Condition (c) is not satisfied.

  3. 'Geographic areas containing one or more whole WAs' – This means that the CSD or CT was large enough to contain whole WAs. None of the WAs crossed into a different CSD or CT. Therefore, condition (c) was satisfied. This scenario occurs frequently for CTs because CTs occur in urban areas, which are usually subject to sampling, and CTs are designed to be larger than WAs in general.

  4. 'Geographic areas that cross at least one WA boundary' – This means that the CSD or CT is shared by at least 2 WAs. Condition (c) is not satisfied.

According to the figures presented in Table 7.1.2, 13.2% of CSDs and 67.0% of CTs are made up of one or more whole WAs. It is here that the closest agreement between population counts and sample estimates is most likely to occur. The results in Table 7.1.2 are very similar to the results from 2001 because the same automated algorithm was used in both censuses. 

For more information about weighting areas and their delineation, see Kruszynski (1999).

Table 7.1.1  Size distribution of weighting areas

Table 7.1.2  Number of census subdivisions and census tracts that respect weighting area boundaries, 2006 Census

7.2  Evaluation of the census weighting methodology

7.2.1  Distribution of weights

Chart compares the 2006 final weight distribution to that of 2001. The distributions are almost identical, but the chart shows that there were slightly more households with weights fewer than 4 in 2006 than there were in 2001. Conversely, there were fewer households with weights between 4 and 9 in 2006 than in 2001.

Charts, and compare the distributions of the 2006 Census initial weights, post-stratified weights, first-step weights and final weights. The initial weights are tightly clustered around 5 as a result of a one-in-five sample of households being selected. The post-stratified, first-step and final weight distributions become progressively more spread out as the constraints become more restrictive.

Chart  Comparison of 2006 and 2001 final household weights

Chart  Comparison of initial weights and post-stratified weights, 2006 Census

Chart  Comparison of post-stratified weights and first-step weights, 2006 Census

Chart  Comparison of first-step weights and final weights, 2006 Census

7.2.2  Discrepancies between population counts and sample estimates

As discussed in Section 4.4, the final weights are chosen so as to reduce or eliminate discrepancies between the population counts and the corresponding sample estimates for 34 constraints at the WA level (see Appendix B). Some discrepancies remain, however, since constraints are sometimes discarded (see Sections 4.4 and 7.2.3). The population/estimate discrepancy is defined as:

The numerator in the above expression (sample estimate - population count) is referred to as the 'population/estimate difference.' The comparison between sample estimates and population counts is based on occupied private dwellings from sampled CUs.

Table and Charts and show the 2006 and 2001 WA -level constraints using either the initial or the final weights. Chart is similar to Chart 6.1 in that it is based on initial weights, but it shows population/estimate discrepancies rather than Z statistics, so much of the discussion of Chart 6.1 is applicable to Chart

In Table and Chart, it is also shown what the sampling bias would have been in 2006 if the 2001 approach of applying document conversion rather than whole household imputation (WHI) had been used to deal with total non-response long forms. To determine this, long forms with whole household imputation applied were treated as short forms and the initial weights were recalculated at the CU level to reflect this. The recalculated initial weights were applied to the reduced sample to generate new population/estimate differences that appear in the column labelled 'Without WHI' in Table These differences also appear as discrepancies in Chart and are labelled as '2006 without WHI' in the legend. The population/estimate differences under whole household imputation using the original initial weights and the unreduced sample are placed in the column labelled 'With WHI' in Table and '2006 with WHI' in the legend of Chart

In general, it can be seen that the 2006 'Without WHI' differences in Table are much more like the 2001 differences than the 2006 'With WHI' differences. Also, the population/estimate differences are frequently smaller for 'With WHI' than for 'Without WHI,' in 2006 (e.g., this is the case for female, persons aged less than 15 years or those aged 45 years and over; marital status married, widowed, divorced or separated; households of size 1, 2, 4 and 5; and, single-detached dwelling type and apartments less than five storeys). Thus, the introduction of whole household imputation in 2006 to deal with total non-response households was generally beneficial. 

While not shown in Table and Chart, the initial weights for the reduced sample were calculated a second time separately for 1, 2, 3, 4, 5 and 6+ households at the CU level. Under this approach, the 2006 'Without WHI' differences were much more similar to the 2006 'With WHI' differences. This suggests that whole household imputation gives similar results to what would have been achieved by document conversion and initial weights if the initial weights had been post-stratified by household size.

Table shows that, compared to 2001, the absolute value of the 2006 population/estimate discrepancies based on final weights are noticeably larger for the age ranges 15 to 19 and 25 to 34, but similar or smaller for the other age ranges. The absolute discrepancies in 2006 are also larger for households with 4, 5, and 6+-persons. As discussed in Chapter 6, the fact that the number of persons on the 2B paper questionnaire was reduced from 6 to 5 in 2006 is likely a major cause for this. In comparing Charts and, it can be seen that the 2006 population/estimate discrepancies based on final weights are dramatically smaller than those based on initial weights, with the exception of the 5-person and 6+-person households. This is likely due to the difficulty of correcting for such large initial biases while still correcting for the remaining constraints at the same time. The discrepancies for these two constraints are still significantly reduced with the final weights compared to those with the initial weights. It should also be noted that the discrepancies based on the final weights for the two dwelling type characteristics (single detached dwellings and apartments < 5 storeys) have been noticeably reduced from those based on the initial weights despite the fact that these were not controlled on in all WAs. The reduction in the discrepancy for these characteristics likely resulted in an increase in the discrepancy for other characteristics that were dropped in their stead. The exact impact on the other characteristics cannot be observed due to the many factors at play. Chart is the same as Chart, but it has been rescaled so that the discrepancies are more easily seen for the other constraints. Chart shows that aside from household size constraints, the 'common law status = yes' constraint has the largest discrepancy.

Table and Chart show the 2006 population/estimate differences and discrepancies based on final weights for the 34 WA-level constraints, based on Pass 1 and Pass 2 results, for Canada (see Section 4.5). The Pass 1 discrepancies are smaller than the Pass 2 discrepancies, due to the fact that the census weights were calculated based on Pass 1 results. Chart examines the difference between Pass 1 and Pass 2 results for both the 2006 and 2001 censuses. It shows that, with the exception of the common law, widowed, and separated constraints, the difference between Pass 1 and Pass 2 estimates is much lower in 2006 than in 2001. This may be partially due to the whole household imputation process which may have resulted in more consistency between the Pass 1 and Pass 2 data than in 2001.

Table shows that there is no population/estimate difference for the total number of persons with both Pass 1 and Pass 2 results. It should be noted that this represents a combination of persons from both private households and senior units. However, when the Pass 1 or Pass 2 results for these two universes are observed separately, then the total population for private households is overestimated by 1,982 persons and the total population for senior units is underestimated by the same amount.

Table presents the counts and estimates for the three separate universes for which the census data may be observed. These were discussed in more detail in Section 4.7. This weighting report focuses on data coming from the Private universe. Table shows the difference in population counts and estimates when collectives and institutions are considered since these are included in published census tabulations.

Table  Comparison of 2001 and 2006 population/estimate discrepancies for Canada

Chart  Comparison of initial weight discrepancies with and without whole household imputation

Chart  Population/estimate discrepancies based on final weights

Chart  Population/estimate discrepancies based on final weights (rescaled)

Table  Comparison of Pass 1 and Pass 2 population/estimate discrepancies based on final weights, for Canada, 2006 Census

Chart  Comparison of Pass 1 and Pass 2 population/estimate discrepancies based on final weights, for Canada, 2006 Census

Chart  Comparison of discrepancies in Pass 1 and Pass 2 differences, 2006 and 2001 censuses

Table  Comparison of universes — Population counts and estimates, 2006 Census

7.2.3  Discarding constraints

For the 2006 Census, 20 sets of parameter combinations were examined in the weighting system for each weighting area (WA), and the set of parameters with the best results in any given WA was chosen (see Section 4.4). 

Appendix B gives a complete list of the 34 constraints being used. Thirty-two of these constraints were part of each test involving the different parameter sets. Two of these, the single-detached dwellings and apartments in buildings with less than 5 storeys, were new in 2006, and were only added as constraints for certain parameter combinations.

Table shows how often each of the 34 constraints was discarded in the 6,602 sampled WAs in 2006 and the 6,141 sampled WAs in 2001. The reason a constraint was dropped (i.e., for being small, linearly dependent, nearly linearly dependent or causing outlier weights [see Section 4.4]) can help explain why certain constraints had large population/estimate discrepancies in Chart This discussion will focus on the 2006 Census results. First, it should be noted that a constraint such as 'Age 0 to 4' can be discarded frequently for being linearly dependent (which means it is redundant) and still have a small population/estimate difference. If a constraint is discarded frequently for causing outlier weights (such as 'Common-law status = yes' or '5-person households') or for being nearly linearly dependent (such as for 1‑, 3- or 4‑person households), it can cause large population/estimate discrepancies, as was observed in Chart

The two dwelling type constraints (single-detached dwellings and apartments < 5 storeys) were new in 2006 and treated differently than the 32 constraints also used in 2001. The level of non-response for the dwelling type variable was analysed at the DA level. These two constraints were automatically dropped for 399 WAs that contained a DA that was determined to have a significant level of non-response for this variable that would make the estimates for these characteristics unreliable. For the remaining 6,203 WAs, the use of these constraints was included as a parameter. Ten of the twenty parameter combinations for which the WAs were processed attempted to control on these characteristics. In the 'cherry-picking' process, 3,688 WAs had the final weights selected from a parameter combination which attempted to control on these two constraints. This means that, for example, while the constraint single-detached dwelling was only dropped by the weighting system for 304 WAs, it was still only controlled on in 3,384 WAs.

Table summarizes the information found in Table The total number of constraints dropped is higher in 2006 because there are more WAs (6,602 WAs in 2006, 6,141 WAs in 2001), but the average number of WA-level constraints dropped per WA is fairly consistent between 2001 and 2006.

Table also summarizes information on the frequency of discarding DA-level constraints on the number of households and the number of persons. If a WA contained 8 DAs, for example, it would have 16 DA-level constraints. Overall there was a decrease in the average number of DA-level constraints being dropped (0.8 in 2006, 1.1 in 2001). The most notable decrease appears in the SMALL category, where only 248 constraints were dropped in 2006, compared to 1,354 in 2001. This is partially due to having more sets of parameters to choose from.

Table  Frequency of discarding weighting area-level constraints in 2001 and 2006 in final weight adjustment

Table  Frequency of discarding constraints at the weighting area and dissemination area levels in 2001 and 2006 in final weight adjustment — Summary statistics

previous gif   Previous page | Table of contents | Next page  next gif