Statistics Canada
Symbol of the Government of Canada
Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

7. Reverse Record Check

7.1 Sampling

7.2 Processing and classification

7.2.1 Processing

7.2.2 Classification

7.3 Data collection

7.3.1 Environment

7.3.2 Operations

7.3.3 Tracing

7.3.4 Collection statistics

7.4 Estimation

7.1 Sampling

The target population, which consisted of all persons who should have been enumerated in the 2006 Census, was formed from six sources (sampling frames) presented in Table 7.1.1. The first five frames were used to estimate undercoverage in the ten provinces, whereas estimates for the three territories were calculated based on samples from the last frame only.

One disadvantage of multiple sampling frames is the possibility that someone will be included in more than one frame. For example, a person in the immigrants frame may have been in Canada on a work permit in May 2001, and thus have been enumerable in the 2001 Census. The person would then be in both the immigrants frame and the census frame if he or she was enumerated, or in the immigrants frame and the missed frame if not enumerated. It is important to identify all cases of frame overlap. If this is not done, estimates may be too high because some people have been included twice in the frames. Though such overlap was identified wherever possible when preparing the sampling frames, some was also identified later based on information provided by the respondents.

Another difficulty is that none of the first five sampling frames covered people who had emigrated, or who were outside the country at the time of the 2001 Census and had returned during the intercensal period ('returning Canadians within a province'). According to demographic estimates, this population is estimated to contain 210,406 people. To this number add 12,817 persons returning from a territory to a province, and 4,955 from Indian reserves or Indian settlements that were partially enumerated in 2001 and enumerated in 2006. Coverage error estimates do not include these populations, which are estimated to total some 228,178 people.

Sample allocation was done in two stages. First, the national sample was allocated to the provinces using a combination of proportional allocation to achieve the same variance for all the provincial estimates of the undercoverage rate and optimal allocation to achieve the national estimate of the undercoverage rate with the smallest variance. The second step was to determine the allocation of the provincial samples to the strata. This was also done via optimal allocation based on historical undercoverage rates (overcoverage was also taken into account in 2001, but not in 2006), historical non-response rates, and stratum size. The exception is the missed frame where everyone who was classified missed in the 2001 RRC was selected. It should be noted that the allocations are only approximately optimal because assumptions were made about the size of some populations such as the projected number of intercensal births and immigrants. The total allocated sample was 69,602 people distributed among the frames (67,664 in the provinces, and 1,938 in the territories). Table 7.1.1 presents the sample allocation by sampling frame. Table 7.1.2 gives the allocation by sampling stratum for all provinces.

Table 7.1.3 gives the allocation by stratum for all territories.

Since the sample allocation depends on assumptions about the size of some populations such as the projected number of intercensal births and immigrants, the actual sample size for the provincial sample from the births, immigrants, and non-permanent residents frames is not known until after the final sample is selected. This is not the case for the territories' sample. Table 7.1.4 gives the final sample size for each province and territory. The 2006 RRC sample consisted of 67,813 persons in the provinces and 1,938 persons in the territories. In addition, 84,522 persons with a weight of one contributed to the territorial estimates.

The sample design varies by frame according to the nature of the list that was used. In the 2001 Census frame, the sample design was a one-stage stratified design. The population was stratified by province of residence, sex, age, and marital status. People enumerated on Indian reserves in the 2001 Census were placed in separate strata. As mentioned, we used optimal allocation in each stratum. The sample was allocated to strata in order to obtain the largest possible number of 'missed' cases.

Sampling fractions were not the same in all strata. To make the sample design more efficient, higher sampling rates were applied in subgroups for which high undercoverage or a lower tracing rate was expected. For example, as in the 2001 RRC, single males aged 20 to 24 in 2006 had a greater probability of being selected, since it had been observed in previous RRCs that undercoverage was consistently higher in that stratum. As a result of increased interest in the aboriginal population, the size of the sample in the provincial strata for people on Indian reserves enumerated in the 2001 Census was double the 2001 sample size.

The missed frame is a conceptual frame since there is no list of all persons missed in the 2001 Census. The sample for this frame consists of all cases classified as 'missed' in the 2001 RRC. The sample is not stratified per se, though there is an implicit stratification since the 'missed' cases in 2001 were from different frames and strata.

For the births frame, copies of birth registrations for the intercensal period were obtained from vital statistics. The frame was then stratified by mother's province of residence. Provincial samples were selected systematically, after sorting by date of birth of the child.

The immigrants frame is constructed from immigration records obtained from Citizenship and Immigration Canada, and stratified by province. In 2006, unlike in 2001, there was no yearly stratification for the three provinces that receive the most immigrants (i.e., Quebec, Ontario and British Columbia). Provincial samples were selected systematically, after sorting by year of immigration.

The non-permanent residents frame (permit holders and refugee claimants) was constructed from records obtained from Citizenship and Immigration Canada. Records were sorted by province. Unlike in 2001, for Quebec, Ontario and British Columbia no strata containing refugee claimants or holders of study, minister's or work permits were created. Provincial samples were selected systematically, after sorting by type of permit and refugee status, to ensure each of these groups was adequately represented.

The methodology for the territories was changed in 2006. As with previous RRCs, the sampling frames of the three territories were created from their respective health care files. Some files from other sources were added for the Yukon Territory in order to improve basic coverage. The people listed in the sampling frames of each territory were then matched by name, sex and age with the 2006 Census response database using exact matching. A manual verification was also performed. Matched people were classified as enumerated, and given a weight of 1. People not classified as enumerated were then stratified by age and sex. After sorting by geography, a one‑stage systematic sample was taken from each stratum.

The next step after selecting samples was to prepare the sample, which included checking the quality of information for different variables of interest (i.e., geographic or demographic). For example, we checked the accuracy of names and the validity of birth dates. Addresses were standardized to facilitate subsequent processing. To update the geographic information, especially for the census sample and the missed where the information was from 2001, we then matched to Canada Revenue Agency's 2000 to 2005 personal income tax files. We also used these files, along with vital statistics data, to verify whether any selected persons had died.

7.2 Processing and classification

7.2.1 Processing

The goal of processing is:

  1. To determine whether each selected person (SP) was part of the census target population.
  2. If so, to determine whether each SP was enumerated.
  3. To provide further information for the non-response adjustment.

The results of processing were used to determine the classification assigned to an SP for estimation and tabulation (see Section 7.4 and Section 9).

Most of the work in processing involved searching the RRC version of the 2006 Census Response Database (RRC RDB) to determine whether the SP was enumerated at one of the addresses associated with him or her. The addresses were obtained from various sources including:

  • the sampling frame for the selection address
  • updates from tax records
  • the computed-assisted telephone interview (CATI) and paper questionnaires(see Section 7.3)
  • matches with the RRC Response Data Base (RDB) using birth date and sex of the SP and members of his or her household, or, the SP's name, postal code or telephone number.

The RRC RDB is an early version of the 2006 Census Response Database (RDB) that is available before the end of census processing. There are some minor differences between the RRC RDB and later versions of the census databases. In particular, the RRC RDB, which is a database of persons, contains all census records for persons with three exceptions. The first are imputed census records for imputations made during whole household imputation (WHI). The second group consists of census records with missing or invalid names, or incomplete or invalid birth dates. This group is also known as the 'incompletely enumerated.' The third group consists of all census records that were added late, after the start of RRC processing.

The first step after sample preparation was to process all SPs with the addresses available from the sampling frame and tax data to search the RRC RDB for each SP. There were two outcomes. When the SP was found, the classification of 'enumerated' was usually assigned and no further processing was required. An exception was SPs who were later identified as deceased before the census from vital statistics for deaths. When the SP was not found, the case was sent for collection. While collection was taking place, searching the RRC RDB continued. When data from the CATI interview was available, it could be determined whether or not each SP was part of the census target population. If so, the CATI data could enable further searching.

Searching was done both automatically and manually by clerical staff. Automated searching was done first as follows: for addresses obtained from a match with the RRC RDB, there was a corresponding census questionnaire. First, we calculated a measure of similarity between the census questionnaire and the RRC data. When this measure was above a specified threshold, it was automatically concluded that the SP was enumerated at that address. If so, neither this address nor the SP's other addresses needed to be processed by the clerical staff. Computer programs also determined when one address was a duplicate of another. These duplicate addresses also did not need to be processed.

To search manually, the clerical staff used a number of tools. There were often suggested census questionnaire or census collection units that matched the address. Staff could also search the RRC RDB using flexible parameters. Electronic telephone directories were also used. The results of the manual search were then automatically edited to minimize errors. A file containing the search results was then produced. It is the data from this file that was used to classify SPs.

7.2.2 Classification

Processing provides the information required to determine which SPs were:

(a) 'listed'
(b) 'mobile'
(c) included in the 'census target population'
(d) 'enumerated'
(e) 'missed.'

Some SPs belonged to three or four of these categories. Other SPs did not belong to any of these groups. This is explained in more detail in this section. The 'census target population' includes the groups of persons enumerated in Section 2.2. An SP is considered 'out of scope' if he/she is not part of the census target population. Each SP classified as out of scope is assigned a reason for the classification such as death, emigration, or representation by another sampling frame. In order to classify an SP as deceased, the death must have appeared in the vital statistics files as a registered death. SPs classified in the census target population are either 'enumerated' or 'missed.' An SP is considered 'enumerated' if he/she was in the RRC RDB. The 'missed' classification was assigned to SPs in the census target population who were not enumerated.

The definitions of 'listed' and 'mobile' depend on whether or not the addresses and information from the CATI interview were required to determine the classification. In many cases, collection provided addresses that were not available from the other sources. In other cases, all of the addresses obtained during collection were also available from another source. An SP was 'listed' if he/she was classified without using data from the CATI interview. That is, even if collection data were obtained, the address/addresses collected during the interview was/were not required. An SP was considered 'mobile' if his or her usual place of residence, as defined in Section 2.4, was only available from the collection data. Further, by definition, SPs that are not in the census target population, and therefore classified as out of scope, are mobile.

Selected persons for whom one or more of characteristics (a) to (e) cannot be determined are considered non-respondents. There are two types of non-respondents:

  • An SP is 'not identified' when it cannot be determined whether or not they are listed.
  • An SP is 'not traced' when it cannot be determined whether or not they are included in the census target population.

Table 7.2 presents the distribution of the sample by classification and sampling frame. The classification is determined from specific combinations of characteristics (a) to (e). Data for the territories is divided into the matched stratum and the unmatched strata. Among the 67,813 SPs selected in the provinces, 56,789 were classified as 'enumerated,' 5,431 were classified as 'missed' and 2,901 were non-respondents. An adjustment for non-response is done during the estimation (see Section 7.4). Note that the definition of a non-respondent for classification, and therefore for estimation, is not the same as the usual definition of a non-respondent for whom data collection is attempted but not completed. This is because classification uses data from many sources of which one may be collection. To avoid confusion, Section 7.3 on collection refers to 'completed collection' rather than 'response.'

'Traced' SPs are SPs for whom it can be determined whether or not they are included in the census target population. For purposes of estimation and tabulation, traced SPs are our respondents. Since names, including those of household members, and addresses are available in the RRC RDB, and the tools for consulting the database are sufficiently powerful, it can be verified whether a SP is enumerated at an address even if the address is vague. This ensures that SPs are classified as traced only when it is known whether or not they are mobile and whether or not they are enumerated.

The usefulness of knowing whether a SP is enumerated is self-evident. Selected persons who are in the census target population who are not enumerated, and therefore classified as missed, are the basis for the estimate of undercoverage. We also wanted to classify the respondent SPs according to characteristics (a) to (c), in order to choose the most appropriate respondents to represent the non-respondents. The above definitions implied that:

  • not identified SPs are also not traced
  • not traced identified SPs are not listed
  • enumerated not mobile SPs are listed
  • enumerated mobile SPs are not listed.

We also determined the Census Day address (usual place of residence) of each SP in the census target population. This is the address where, according to census instructions, the SP should have been enumerated. If the SP was enumerated, the enumeration address is considered to be the Census Day address even if other information may have raised doubts about the proper interpretation of census instructions.

More information on classification can be found in Diallo (2008).

7.3 Data collection

7.3.1 Environment

Head office (HO) staff in Ottawa worked closely with staff in five Statistics Canada regional offices (ROs) to collect data during the survey phase of the RRC. These ROs were located in Halifax, Sherbrooke, Toronto, Winnipeg and Edmonton. The suggestions and recommendations made by the ROs as a result of conducting the 2001 RRC were incorporated into the design and operations of the 2006 survey. HO was responsible for providing a computer-assisted telephone interviewing (CATI) application that met the needs of the survey and was interviewer and respondent friendly.

Assignment of the sample to the ROs was based on HO's 'best guess' about where the selected person (SP) was residing during the collection period. Once a case was assigned to an RO, it was never transferred to another RO even if it was determined that the SP moved outside the RO collection area. RO coverage areas and survey counts are shown in Table 7.3.1.

A total of 20,114 cases were sent for collection. Section 7.1 describes the two sample designs used in the RRC for the provinces and for the territories. The number of cases requiring collection in the territorial sample was the sample of 1,938 taken from the unmatched strata. For the provincial sample, the number of cases requiring collection could not be determined until after all cases were sent for a first attempt at processing whereby the RRC census response database (RRC RDB) was searched. There were two outcomes to this search. When the SP was not found, it was sent for collection. There were a total of 8,453 such cases, referred to as the 'regular' sample. A sample of 11,231 SPs was selected from among the found SPs. These are referred to as the 'non-response adjustment (NRA)' sample. The collection results for the NRA sample were used to estimate a parameter of the RRC non-response adjustment model described in Section 7.4. RO staff was not made aware if a case was NRA or regular.

The 20,114 cases sent to the field represented 28.4% of the RRC sample. Most of the sample not sent for collection was related to SPs who were found on the RRC RDB during the first search. A classification of enumerated could therefore be assigned to these SPs and no further work was required. The remainder of the sample not sent for collection included 729 deceased SPs, SPs from the sample of 2006 births who were not available in time and 24 cases not sent for other reasons including frame overlap or insufficient information to determine exactly who the SP was.

There were three versions of the RRC Survey questionnaire; non-proxy, proxy, and deceased before Census Day. The content of the 2006 RRC Survey questionnaire focused on the collection of addresses, especially those where the SP lived on Census Day and in the month of May 2006. Names and demographic data were collected for all Census Day household members. The three 2006 Census questions on Aboriginal identity were added to the RRC for the first time. Collection was proxy by design for everyone who was less than 18 years of age and SPs presumed deceased. Otherwise, proxy respondents were used when the SP was not available during the survey period or was difficult to reach.

When it was determined at the time of contact that an SP was deceased, it was important to ascertain with a proxy respondent if the SP had died before, on, or after Census Day. Different paper questionnaires and CATI flows were used depending on the date of death. In some cases, it was known that the SP was deceased prior to collection. If two sources such as taxation data and vital statistics indicated the SP was deceased, the case was not sent for collection. If one source indicated that the SP had died, the case was sent for collection with a flag indicating that he/she was 'presumed deceased.'

The main survey data collection method was CATI. The CATI application was developed using many of the standards set for all CATI questionnaires conducted at Statistics Canada. The application was constructed of various interrelated modules and was accessed through the generic interface for ROs. Interviewers were assigned to cases based on language and whether cases required interviewing or tracing.

The 2006 RRC Survey was a multiple collection mode survey. Paper questionnaires in both official languages were available for those SPs who were contacted but requested a paper questionnaire as opposed to giving information by telephone. Selected persons who the RO did not succeed to contact by telephone were mailed a paper questionnaire package prepared and sent to the best address as determined by the RO. Selected persons were asked to return their completed paper questionnaire to the RO. Finally, some responses were obtained by field interviewers using the paper questionnaires. Data capture from the paper questionnaires was done in the ROs using the CATI system. All of the coordination work necessary to operationalize a sequential multiple mode survey was done by the RO managers in partnership with HO. Unlike the 2001 RRC Survey, there was no follow-up survey.

Tracing was a key aspect of the 2006 RRC. Tracing refers to the work done to find telephone and address information for SPs or a proxy person. As part of sample preparation, cases were linked to tax data to provide updated contact data for the SP and their household members. In some cases, initial CATI data was outdated or incomplete and therefore tracing was required.

7.3.2 Operations

Data collection consisted primarily of interviewing and tracing. As data collection began, 82.8% of the cases sent for collection were placed in the queue for interviewing and the remaining 17.2% in the tracing queue. As required, cases were moved back and forth between interviewing and tracing. For SPs initially in the tracing queue, no telephone number had yet been found for the SP or any family member. As tracing leads were found, cases were moved to interviewing. When all tracing leads were exhausted for interviewing cases they were moved to tracing.

The CATI input data were loaded as sample preparation and the first search of the RRC RDB in processing was completed. Data collection began in the ROs on January 10, 2007. Active collection ended on July 15, 2007. In total there were 184 days where at least one RO was actively collecting data. A total of 16,984 questionnaires were completed in this time frame. Between July 16 and July 31, 2007, a passive collection took place wherein returned paper questionnaires or persons calling the RO to do the survey were handled. During this time, 112 questionnaires were completed. It should also be noted that even among the 16,984 questionnaires deemed complete by the ROs, some were later judged in HO to be either seriously incomplete or conducted with an incorrect SP.

Data collection was conducted using two shifts of interviewers working six or seven days per week. Interviewers were given the survey objectives and background along with a detailed training manual. Mock interviews were incorporated into the training sessions using the CATI application. A call scheduler assigned cases to interviewers in normal operations. On occasion, an interviewer could be assigned to manage specific cases. This may have been to take an in‑coming call or to make a call to someone who preferred to speak in a non-official language. Calls were made overseas especially for SPs in the non-permanent resident (NPR) group who had left Canada. Quality management of the collection operation included monitoring of the interviewers, retraining and the discussion of specific data quality issues noted in HO relating to completed questionnaires. Regional office managers allocated resources to the survey while balancing the needs of other surveys taking place in their region. Sustained efforts to interview persons who initially refused to participate in the survey improved response rates.

Table 7.3.2 shows the distribution of cases sent to ROs from HO over time. Interviewing typically began in the RO as soon as new cases arrived. The adjusted total reflects cases that were dropped by the ROs as a result of an HO request. This was made because HO processing was able to resolve a regular sample case that had gone into the field. This may have been due to a SP being confirmed as being deceased from the 2005 and 2006 vital statistics (VS) files or the SP was found on the 2006 RRC RDB. Additionally, some cases were resent to an RO if a case that was originally completed and returned did not meet the data quality standard expected.

Survey data were sent electronically to HO from the five ROs each night after interviewing came to a halt. Transmission reports and collected survey data were reviewed each morning by HO staff. Cases considered unsuitable for processing were reactivated and sent back to the RO for follow up.

Three detailed management reports were created at HO to document the progress of the survey. One report gave statistics on the cases currently in the RO (unopened cases, completed cases, and opened cases not yet completed). The second report presented very detailed statistics of a number of RO outcomes. This report was produced on a weekly basis. The third report included progress by variables such as case type, sampling frame and stratum. Case completion projections were made for the ROs to help them meet their collection targets.

Data collected in the field were analyzed at HO for completeness and accuracy. Cases were rejected if data were missing or ambiguous in key fields or if the data had mistakenly been acquired for someone other than the SP. Cases which were not rejected were compiled into batches for processing as described in Section 7.2.

The average duration of the CATI interview was 13 minutes. The actual time spent on each case however was larger given the number of contact attempts required and the amount of tracing that was involved.

7.3.3 Tracing

Tracing was undertaken by both HO and the ROs and was critical to the success of the RRC. Of the 3,456 cases that started in tracing, successful leads that yielded interviews were found for 66% of them. Among the 16,658 cases that started in the interviewing queue and required tracing, the trace rate was higher at 88%. Numerous valuable leads were also found for these cases. Overall, 11,339 of the 16,944 completed cases, 67%, required some tracing effort.

To increase response rates, RO managers contacted provincial government agencies and departments to obtain addresses and telephone numbers for cases where contact had not been established. Once collection began, HO was engaged in providing tracing leads using several large administrative files containing names and addresses but not necessarily telephone numbers. These files included motor vehicle registration, taxation, GST rebate on new homes and the Canada Post change of address. Additional information specific to SPs on the immigrant and the NPR frames was obtained from Citizenship and Immigrant Canada in paper format. Vital statistics files for 2005 and 2006 were also searched.

Interviewers used a variety of tracing tools, on-line electronic directories being the most popular. However, the most effective tracing leads came from the CATI application itself. Information loaded into the application included addresses from the RRC RDB (which is from the 2006 Census itself) and older taxation files. In cases where the RO received an address lead from HO, an on-line site such as Canada 411 was used to find a telephone number. If this was not successful, then a paper questionnaire package could be sent to the address. In comparing the HO and RO tracing addresses found independently, it was concluded that the larger HO files offered more tracing information that was unique and useful. There was overlap between the two efforts in that the same or very similar addresses were often obtained.

The response rate achieved was high, 84.2% of the 20,114 cases sent for collection. This accomplishment was due to the extensive tracing carried out by HO and the ROs. Another factor was the persistence of RO staff in calling an SP when they had the correct telephone number but no one was answering. The median number of contact attempts made for completed cases was seven. For cases that were never completed, the median number of contact attempts was 28. These numbers vary by the province and territory of selection, sampling frame and demographic variables. A case may have a high number of contact attempts though it may not ever have required any tracing.

Table 7.3.3 shows the number of contact attempts for completed and not completed cases by the sampling frame. Close to 222,400 calls were made for completed cases and about 104,300 calls were made for cases for which no completed survey was ever obtained.

It was expected that the work involved in initially contacting an SP from the NRA sample would be easier than for an SP in the regular sample because the initial CATI contact data included the SP's most recent address from the RRC RDB. However, there were many NRA sample cases that took more contact attempts to talk with the SP compared to persons in the regular sample.

7.3.4 Collection statistics

Many statistics were monitored throughout the data collection period. An analysis was done after collection was complete.

Table 7.3.4.1 shows provincial and territorial completion rates by type of case as either regular or NRA. The table shows that completion rates are higher for the NRA cases. This is expected because the initial CATI data included the more recent address specified in the 2006 Census. These SPs would have, with a probability close to 1, been classified enumerated. The distribution of the SPs in the regular sample is different. Compared to the entire RRC sample, persons in the regular sample come from strata with a lower expected probability of being classified enumerated and a higher expected probability of being classified 'missed' or 'out of scope.' Evidence from past RRCs indicates that such persons are more difficult to contact.

Table 7.3.4.2 gives completion statistics by frame and case type. The low response rate for the SPs in the NPR frame is partially because many NPRs appear to have left Canada prior to the end of their permit expiry date. Also, in many cases, the permit expiry date came before the start of survey operations. It was frequently very difficult to locate these SP or a suitable proxy. This was especially true for NPRs with a permit to study in Canada where the completion rate was just 62.0%.

Table 7.3.4.3 gives completion statistics by stratum and type of case for the sample selected from the demographic strata. As discussed in Section 7.1, demographic strata were used for the 2001 Census frame and the unmatched frames in the territories.

Another statistic of interest is the degree to which questionnaires were completed by proxy. Collection was proxy by design for everyone who was less than 18 years of age and SPs presumed deceased. Otherwise, proxy was used when the SP was not available during the survey period or was difficult to reach. Overall, 6,363 cases representing 37.6% of the completed sample were done by interviewing a suitable proxy.

Table 7.3.4.4 gives, for Canada and the provinces and territories, the number of cases sent for collection, the number of these that required tracing, and the percentage of cases sent for collection that required tracing. The tracing rate was highest among the provinces for Ontario and British Columbia and for Yukon Territory and Nunavut.

There were three modes of collection, CATI, self-enumeration using the paper questionnaire, and personal interview also using the paper questionnaire. Of the 16,944 completed questionnaires. 94.1% were done by CATI, 4.5% were done by self-enumeration, and 1.4% by personal interview. These data show the importance of the multiple mode approach. Without the use of self-enumeration and in-person interviewing, the national completion rate could have been less than 80%. The collection mode varied by province and territory. This may reflect different operational methods in the ROs or differences in the characteristics of the persons requesting a questionnaire, or different demographic distributions. Self-enumeration was particularly important in Ontario where 6.9% of the completed cases were done by self-enumeration, and in British Columbia where 10.6% of the completed cases were done by self-enumeration.

7.4 Estimation

The final weights of the selected persons (SP) began with their initial (or design) weights. The initial weight of an SP from the missed frame is the final weight assigned to him or her during the previous Reverse Record Check (RRC) when the SP was classified as missed. For the other sampling frames, the initial weights are generally equal to the inverse of the probability of selection. The exception is the non-permanent residents frame where the initial weight is higher to account for the small number of non-permanent residents who were not in the sampling frame when the sample was selected. Final non-permanent resident counts were only available after the sample was selected. Initial weights were adjusted to add to these counts.

In order to reduce bias, the initial weights of the respondents had to be adjusted to account for non-response. The weight of the non-respondents was redistributed among the respondents. Where possible, this was done by ensuring that the weight of non‑respondents with certain characteristics was redistributed only to respondents with the same characteristics. The following characteristics (or 'metadata') were used: sampling stratum; indication that the SP filled out a tax return for the year preceding the census year thus providing us with an indication that the SP is in the target population; and whether or not the SP was listed, mobile, or part of the target population.

The weight adjustments were done with the aid of the StatMx module of Statistics Canada's Generalized Estimation System (GES). In order to accomplish the redistribution of the weight of the non-respondents, the RRC was viewed as a sample in three phases where each phase corresponds to the 'selection' of a nested sample as follows. Selection of the SPs from the sampling frames was the first phase, then selection of the identified SPs from the all of the SPs and, last, selection of the traced SPs from the identified SPs. When a respondent with the same characteristics as a non-respondent could not be identified in a stratum, the stratum was grouped with another stratum deemed similar.

After adjusting for non-response, the estimated number of enumerated persons in the territories has traditionally been lower than the comparable census count. This is likely due to undercoverage of the census target population in the health care files. To address this bias, the weight of SPs selected in a territory was adjusted so that the estimated number of enumerated persons equalled the comparable census count for that territory.

Table 7.4 presents the weighted distribution of the sample by classification and sampling frame. Refer to Section 7.2 for the definition of the classification. Note that only SPs found in the RRC RDB are classified as enumerated. The RRC RDB differs from the final census database in that it does not include imputations made during whole household imputation (WHI), enumerations with an invalid or missing name or an incomplete or invalid birth date, or enumerations added after the start of the RRC data processing phase. People from the target population who are not in the RRC RDB are classified as missed. Census population undercoverage is estimated by the number (weighted) of missed persons less the number of persons excluded from the RRC RDB. This is the 'X' for the database extraction factor referred to in Section 9.

Last, in order to calculate the variance of the estimates, the RRC sampling frame was viewed as a stratified design with selection probabilities proportional to size. The size measures were constructed so as to reproduce the final weights.

You can obtain more information on the 2006 RRC estimation methods from Théberge (2008).

   Previous page | Table of contents | Next page