Statistics Canada
Symbol of the Government of Canada
Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

3. Population coverage error

3.1 Sources

3.2 Control

3.3 Definitions

3.4 Measurement

3.1 Sources

Although there are high quality standards governing the collection and processing of census data, it is not possible to eliminate all errors. There are two kinds of population coverage error. Population undercoverage is the extent to which persons who should have been enumerated are not included in census data while population overcoverage is the degree to which census data include persons who were enumerated more than once, usually twice.

Undercoverage can occur in the first stage of the census if the list of dwellings constructed to cover the census dwelling universe is incomplete. This risk is higher, for example, if a dwelling is under construction. Conversely, overcoverage can occur if a dwelling is listed twice.

Coverage error is also likely to occur during the field data collection stage. Respondent error is responsible for coverage error when the person completing the census form either excludes or omits someone whose usual place of residence, according to the census rules, is the dwelling; this is undercoverage. Or, he/she includes someone whose usual place of residence is not the dwelling; this will become overcoverage if this person is also included at his/her usual place of residence or somewhere else. In most cases, it is easy to determine someone's usual place of residence. However, there are a number of situations where the process, as stated in the previous section, is not elementary and special rules have been created in order to define an individual's usual place of residence. Although the rules are set out in the census form, the list is long and there may be comprehension challenges. Coverage error may result when the rules are not consulted or when they are incorrectly applied. The notion of Census Day as the reference date, for example, for determining usual residence is also critical to the potential for coverage error.

Coverage errors can also be introduced during the processing stage at any point where records for persons or households are added or removed from the census database. Records can be erroneously cancelled or lost. Questionnaires may be linked to the wrong record or returned too late to be included.

Although efforts are made to enumerate the homeless population, the risk of undercoverage is high for this population. Some other living arrangements are particularly vulnerable to coverage error. Young adults newly away from home, for example, can be either undercovered because neither the roommates nor a parent lists them, or overcovered because the person is listed on both census forms. Similarly, persons who maintain a second residence because of their employment may be at risk of coverage error.

Users should also be aware of the extent to which Indian reserves and Indian settlements participated in the 2006 Census. In some cases enumeration was not permitted or was interrupted before it could be completed. In other cases the quality of the enumeration was considered inadequate. These geographic areas, a total of 22, are called incompletely enumerated Indian reserves and Indian settlements. Data for 2006 are therefore not available for the incompletely enumerated reserves and settlements, and are not included in tabulations. Similar problems have occurred in previous censuses. In the 2001 Census there were 30 Indian reserves and Indian settlements that were declared incompletely enumerated. Among these, 14 became participating reserves in the 2006 Census.

In order to produce population estimates covering persons living on the 22 incompletely enumerated Indian reserves and Indian settlements, model-based estimates are produced. Since no reliable source exists to verify the assumptions used in the model, the estimates must be used with caution. You can find more information in Section 12.2.

3.2 Control

Potential sources of coverage error were recognized during the planning of the 2006 Census, and the following measures were taken to minimize them:

  • Collection unit (CU) boundaries were carefully defined and mapped in order to ensure that no geographic areas were left out or included twice.
  • List/leave areas: The enumerator's (EN) manual contained instructions on how to canvass a CU so as to minimize the risk of missing dwellings. The total number of dwellings from the 2001 Census was provided to the field manager to enable him/her to identify notable change. Also, when the listing operation resulted in a significant difference in the number of dwellings relative to the 2001 Census, the listing was checked. Finally, specific quality control procedures were applied to the EN work to assess and eventually to correct the changes done to the listing. Census frames including the definition of list/leave and mail-out areas are described in Section 4.2.
  • Mail-out areas: Mail-out was based upon a list of addresses taken from Statistics Canada's Address Register. This list was verified and updated in the fall of 2005 via a block canvassing field operation. The work of the enumerator was subject to quality control procedures.
  • Collective dwellings: Collectives dwellings are identified before collection. Field staff verify that these dwellings are indeed collectives and, if so, determine whether or not they are occupied.
  • Special procedures were developed to enumerate persons who have difficulty responding (e.g., difficulty in English and French or literacy problems) and to enumerate persons who are located in special core areas of major cities.
  • Special procedures were developed to enumerate the population on Indian reserves.
  • Publicity messages informed Canadians about the census and indicated what to do if they did not receive a questionnaire.
  • The Census Help Line was available to answer any questions about the census including questions related to coverage.
  • The questionnaire contained instructions on 'Whom to include' to inform respondents of whom should be included.
  • The questionnaire included questions asking if there were any persons the respondent was not sure whether or not to list. A telephone follow-up was then done with the respondent to determine if the person(s) in question should or should not be listed on the questionnaire.
  • Telephone follow-up was done after questionnaire editing when inconsistencies were found on coverage issues or to verify status of households including only foreign or temporary residents.
  • Non-response follow-up included some dwelling coverage checks.

These procedures, along with appropriate training, supervisory checks, and quality control systems during census collection and processing, helped to reduce the number of coverage errors.

3.3 Definitions

Algebraic definitions of coverage errors are given in this section. Let T represent the total or 'true' number of persons in the census target population. Then, let C be the published census count of the number of persons in the census target population. The error in using C instead of T is then:

An equation showing that N is equal to T minus C

This error, denoted as N, is the net population coverage error.

Let U denote population undercoverage. U is the number of persons not included in C who should have been.

Let O denote population overcoverage where O is the number of persons included in C who should not have been. There are two components to O. The first is persons who were enumerated more than once. These duplicate enumerations should not have been included in C. The census coverage studies focus on duplicate enumerations. The second component of O is persons who were included in C who are not in the census target population. Foreign residents visiting Canada, for example, who are listed on a census form as usual residents of a dwelling should not be included in C. Fictitious persons are another example. The number of persons included that are not in the census target population has been seen by previous studies to be negligibly small. Therefore, the 2006 Census coverage studies did not measure this component of coverage error.

Since U refers to persons who should be included in C and O refers to persons who should not be included in C, the difference between Tand C is U less O. That is:

An equation showing that N is equal to U minus O

The true number of persons in the census target population is then:

Two equations showing that T is equal to C plus N, and that T is also equal to C plus U minus O

An estimate of T is given by T hat where:

Two equations showing that T hat is equal to C plus N hat, and that T hat is also equal to C plus U hat minus O hat

U hat is an estimate of the number of persons not included in C that should have been; and  O hat is an estimate of the number of persons included in C who should not have been. Let us assume that overcoverage from persons included in C who are not in the census target population is zero. Therefore, O hat is restricted to an estimate of the number of duplicate enumerations. It is the goal of the census coverage studies to produce U hat and O hat.

Census population coverage error can be usefully expressed as rates relative to the true population: The undercoverage rate R subscript U is U expressed as a percentage of T. The overcoverage rate R subscript O is O expressed as a percentage of T. The net undercoverage rate R subscript N  is the difference between U and O expressed as a percentage of the census target population. These three rates can be estimated by R subscript U, R subscript O, and  R subscript Nas follows:

Two equations showing that R hat subscript U is equal to 100 times U hat divided by T hat, and that R hat subscript U is also equal to 100 times U hat divided by the following: C plus N hat

Two equations showing that R hat subscript O is equal to 100 times O hat divided by T hat, and that R hat subscript O is also equal to 100 times O hat divided by the following: C plus N hat

Two equations showing that R hat subscript N is equal to 100 times N hat divided by T hat, and that R hat subscript N is also equal to 100 times the ratio of the difference between U hat and O hat to the sum of C and N hat

A positive net undercoverage rate indicates that undercoverage is larger than overcoverage. That is, there are more people not included in the published census count C than the number of duplicated enumerations. This has been, and continues to be, the experience of the Canadian census. For some domains of interest, however, negative net undercoverage has recently been observed.

As defined above, U is the number of persons not included in C who should have been. The census count C is composed of two elements: 

An equation showing that C is equal to E plus I

where:

E is equal to  the number of enumerations. This is the number of people who were listed on a census form.
I is equal to   the number of imputed persons. This is an estimate of the number of persons missed in non-response dwellings and in occupied dwellings erroneously classified as unoccupied. More information on whole household imputation (WHI) can be found in Section 5.7.

Undercoverage, therefore, is a subset of all persons who were not listed on a census form but should have been. It does not include those who were not enumerated either because no completed census form was returned for the dwelling (non-response dwelling) or the dwelling did not receive a form because they were erroneously classified as unoccupied (misclassified occupied dwelling).

In summary, the true population T  is composed of the census count C and net undercoverage N. C consists of E plus the number of persons added in WHI I where the imputations are for persons living in non-response dwellings or in misclassified occupied dwellings. N is undercoverage U less overcoverage O.

3.4 Measurement

Two postcensal studies were carried out to estimate 2006 Census population coverage error. The Reverse Record Check (RRC) provided estimates for population undercoverage while the Census Overcoverage Study (COS) estimated population overcoverage.

The RRC and the COS were conducted after census field collection and processing were complete. Preliminary estimates of 2006 Census population coverage error were released March 27, 2008. Following a lengthy and detailed validation exercise with the Demography Division and the provincial and territorial statistical focal points, final estimates were released on September 29, 2008. This release was concurrent with the release of new official population estimates reflecting the update of the base population to the 2006 Census. Census population counts adjusted for net population undercoverage formed the updated base population.

The methodology of the two census coverage studies can be briefly described as follows:

Reverse Record Check (RRC)

In the RRC, a random sample of individuals representing the 2006 Census target population was taken from frames independent of the census such as a list of persons enumerated in the 2001 Census and a list of intercensal births according to provincial birth registries. The 2006 RRC sample consisted of 67,813 persons in the provinces and 1,938 persons in the territories. In addition, 84,522 enumerated persons with a weight of one contributed to the territorial estimates1. The 2006 Census database was searched to determine if the persons selected in the sample had indeed been enumerated.

When required, a telephone interview via computer-assisted telephone interviewing (CATI) out of the regional offices (ROs) was conducted to collect further information to declare the individual as in scope or not in scope for the census, and when in scope, to provide further data for searching. An interview was achieved for 84.2% of the 20,114 cases sent to the ROs. Sampling weights were adjusted to account for non-response whereby the total sampling weights of the non‑respondents was shared among a group respondents most like the non‑respondents in their propensity to respond.

Estimates of population undercoverage are based on persons in the RRC sample who were classified 'missed.' These persons have been found to be in scope for the 2006 Census but no evidence of enumeration in the 2006 Census could be found in the 2006 Census Response Database. Nationally, there were 5,431 persons selected by the RRC classified as missed in the provinces and 676 in the territories.

Census Overcoverage Study (COS)

Overcoverage was measured by matching the final 2006 Census database to a partial list of persons who should have been enumerated constructed from administrative data sources, and by matching the 2006 Census database to itself. The COS applied automated exact matching to the administrative sources and probabilistic matching to the census database. Probabilistic matching identifies matches that are close but not exact. Pairs of these potential duplicates were sampled and name and demographic characteristics were examined to identify overcoverage.

Note:

  1. The large sample size in the territories is because a different methodology is used. The sample frames were first matched to the entire census database. Matches were classified as enumerated if they were found in the same territory or out of scope if they were found elsewhere. All of the matched persons from the sample frames were included in the RRC sample with a weight of one. An additional sample of 1,938 persons was selected from the non-matches.

   Previous page | Table of contents | Next page