# Coverage Technical Report, Census of Population, 2016 3. Population coverage error

## 3.1 Sources

Although census data collection and processing have to meet high quality standards, it is very difficult to eliminate all potential errors. There are two kinds of population coverage error. Population undercoverage refers to the exclusion of persons who should have been enumerated, and population overcoverage refers to the inclusion of persons who were enumerated more than once (generally twice). Overcoverage also includes persons who were enumerated but should not have been. However, this type of error is considered negligible; consequently, it is not measured.

Undercoverage can occur in the first stage of the census if the list of dwellings used for the dwelling universe is incomplete. This risk is higher, for example, if a dwelling is under construction. Conversely, overcoverage can occur if a dwelling is listed twice.

Coverage error can also occur during the field data collection stage. Respondent error is responsible for coverage error when the person completing the census form omits someone whose usual place of residence, according to census rules, is the dwelling concerned; this is undercoverage. The person may also include someone whose usual place of residence is not the dwelling concerned; there is overcoverage if this person has already been enumerated at their usual place of residence or somewhere else. In most cases, it is easy to determine a person’s usual place of residence. However, as stated in the previous section, the process is sometimes more complex, and special rules have been developed for determining an individual’s usual place of residence. The rules are spelled out in the census questionnaire, but the list is long, and there can be comprehension difficulties. Coverage error may result when the rules are not consulted or are incorrectly applied. The idea of using Census Day as the reference date for determining usual residence may also be misunderstood, which can lead to coverage error.

Coverage errors may also be committed during the processing stage at any point where records for persons or households are added to or removed from the census database. Records can be deleted by mistake. Questionnaires may be linked to the wrong record or returned too late to be included.

Even though efforts are made to enumerate the homeless population, the risk of undercoverage is high. Some other living arrangements are also susceptible to coverage error. For example, young adults newly away from home may be either undercovered, because neither their roommates nor their parents include them in the census questionnaire, or overcovered, because they are included in both census questionnaires. Persons who maintain a second residence because of their employment can also cause coverage error.

Users should also be aware of the extent to which Indian reserves and Indian settlements participated in the 2016 Census. In some cases, enumeration was not permitted by the community or was interrupted before it could be completed. These geographic areas (14 in all in 2016) are considered incompletely enumerated Indian reserves and settlements. There are no 2016 data for incompletely enumerated Indian reserves and settlements, and those areas are not included in the totals. Similar problems have occurred in previous censuses. For example, 22 Indian reserves and settlements were incompletely enumerated in the 2006 Census, and 31 in the 2011 Census. Of those reserves and settlements, 20 participated in the 2016 Census.

The demographic estimates for the 14 incompletely enumerated Indian reserves and settlements are based on a model. However, since no reliable source is available to verify the assumptions in the model, the estimates must be used with caution. For more information, see Section 12.2.

## 3.2 Control

Potential sources of coverage error were recognized during the planning stage of the 2016 Census, and the following measures were taken to minimize the associated risks:

• Collection unit (CU) boundaries were carefully defined and mapped to ensure that no geographic areas were left out or included twice.
• List/leave areas: The enumerator’s manual contained instructions on how to enumerate a CU so as to minimize the risk of missing dwellings. The total number of dwellings from the 2011 Census was provided to field operations supervisors to help them identify significant changes. In addition, when the listing operation resulted in a substantial difference in the number of dwellings relative to the 2011 Census, the listing was checked. Lastly, specific quality control procedures were applied to the CU to evaluate and correct any changes made in the listing.
• Mail-out areas: Mail-out was based on a list of addresses from Statistics Canada’s Address Register. This list was updated regularly and listing activities were carried out mainly in the fastest-growing areas. These listing activities were carried out continuously, but more intensively in the two years preceding the census. Listing operations led to nearly 30% of the addresses in the mail-out areas being checked. The work of enumerators was closely monitored. Some collective dwellings had to be checked by field staff to verify their occupancy status before the collection stage; if they were occupied then they were identified and included in the census.
• Special procedures were developed for the enumeration of persons who have difficulty responding (e.g., people who are fluent in neither English nor French, or are illiterate) and persons located in specific parts of large cities where response or coverage was poor in the past.
• Special procedures were defined for the enumeration of the population residing on Indian reserves.
• The Census Help Line (CHL) was available to answer any questions about the census, including questions about coverage.
• There was a “Whom to include” section in the questionnaire so respondents could determine which persons should be included. Also, almost 70% of the responses to the 2016 Census were obtained through Internet, and the electronic questionnaire included additional verification questions when respondents reported a dwelling as unoccupied or non-existent, or if they had a problem determining whether a person should be included or not.
• In the questionnaire, respondents were asked to indicate whether there were people who had not been listed because they were not sure they should be included. The electronic questionnaire provided guidance so respondents could make the right decision. In the other cases, a telephone follow-up was subsequently carried out with the respondent to determine if the persons in question should or should not be listed in the questionnaire.
• Telephone follow-up was carried out after questionnaires were reviewed for coverage inconsistencies or to verify household status, including questionnaires containing only foreign residents or persons temporarily present.
• Non-response follow-up included a dwelling coverage check.

These procedures, along with appropriate staff training, supervisory checks and quality controls during the collection and processing stages, helped to reduce the number of coverage errors.

## 3.3  Definitions

Algebraic definitions of coverage errors are presented in this section. Let $T$ denote the total or the “actual” number of persons targeted by the Census of Population. Let $C$ denote the published census count of persons in the target population. The error associated with using $C$ instead of $T$ is as follows:

$N=T-C$

This error, denoted as $N$, is the net population coverage error.

Let $U$ denote population undercoverage, the number of persons not included in $C$ who should have been.

The census count $C$ is composed of two elements:

$C=E+I$

Where:

$E$ is the number of persons enumerated. This is the number of persons who were listed on a census questionnaire.

$I$ is the number of persons imputed. This is an estimate of the number of persons missed because their dwelling was classified as occupied but non-response or misclassified as unoccupied, therefore for which no follow-up was done. For more information on whole household imputation (WHI), see Section 3.6 of the Sampling and Weighting Technical Report, Census of Population, 2016, Catalogue no. 98-306-X.

Undercoverage compared with the published census count $C$ is therefore what remains of the persons who should have been listed on a census questionnaire and who were not taken into account by the WHI. In other words, it does not include the estimate of the number of persons who were not enumerated either because no completed census questionnaire was returned for the dwelling (non-response dwelling) or because the dwelling was misclassified as unoccupied (classification error) and did not receive a questionnaire.

The concept of undercoverage before the WHI also exists. This is what is referred to as Census of Population collection undercoverage. For more information, see Section 12.1.

Let $O$ denote population overcoverage, the number of excess enumerations included in $C$ that should not have been.

$O$ has two components. One is the excess enumerations of persons enumerated more than once. Coverage studies focus on these excess enumerations. The second is persons who were enumerated but who were not in the census target population. For example, foreign residents visiting Canada who are listed on a census questionnaire as usual residents of a dwelling should not be included in $C$. Fictitious persons are another example. According to previous studies, the number of persons who are enumerated but are not in the census target population is generally very small and can be ignored. Consequently, census coverage does not measure this component of coverage error.

Since $U$ refers to persons who were not enumerated but should be included in $C$ and since $O$ denotes enumerations that should not be included in $C$, the difference between $T$ and $C$ is $U$ less $O$. That is:

$N=U-O$

The actual number of persons in the census target population is therefore:

$T=C+N=C+U-O$

In practice, for reasons of cost and timeliness of the data produced, an estimate of $T$ is given by $\stackrel{^}{T}$, based on sample studies, where:

$\stackrel{^}{T}=C+\stackrel{^}{N}=C+\stackrel{^}{U}-\stackrel{^}{O}$

$\stackrel{^}{U}$ is an estimate of the number of persons not included in $C$ who should have been, and $\stackrel{^}{O}$ is an estimate of the number of persons included in $C$ who should not have been. We can assume that overcoverage from persons included in $C$ who are not in the census target population is zero, since it is negligible. Consequently, $\stackrel{^}{O}$ is simply an estimate of the number of duplicate enumerations. The purpose of census coverage studies is to determine the values of $\stackrel{^}{U}$ and $\stackrel{^}{O}$.

In summary, the actual population $T$ is composed of the census count $C$ and the net undercoverage $N$. This is referred to as net undercoverage because $U$ is generally larger than $O$ in the context of the current census in Canada. However, the opposite is possible, whereby $N$ would be negative. $C$ consists of $E$ plus the number of persons added in WHI, and this imputation $I$ targets persons living in non-response dwellings or in occupied dwellings misclassified as unoccupied.

Census population coverage errors can generally be expressed as rates relative to the actual population. The undercoverage rate ${R}_{U}$ is $U$ as a percentage of $T$. The overcoverage rate ${R}_{O}$ is $O$ as a percentage of $T$. The net undercoverage rate ${R}_{N}$ is the difference between $U$ and $O$ as a percentage of the census target population. These three rates can be estimated by ${\stackrel{^}{R}}_{U}$, ${\stackrel{^}{R}}_{O}$ and ${\stackrel{^}{R}}_{N}$, as follows:

${\stackrel{^}{R}}_{U}=100*\frac{\stackrel{^}{U}}{\stackrel{^}{T}}=100*\frac{\stackrel{^}{U}}{C+\stackrel{^}{N}}$

${\stackrel{^}{R}}_{O}=100*\frac{\stackrel{^}{O}}{\stackrel{^}{T}}=100*\frac{\stackrel{^}{O}}{C+\stackrel{^}{N}}$

${\stackrel{^}{R}}_{N}=100*\frac{\stackrel{^}{N}}{\stackrel{^}{T}}=100*\left(\frac{\stackrel{^}{U}-\stackrel{^}{O}}{C+\stackrel{^}{N}}\right)$

A positive net undercoverage rate indicates that the undercoverage rate is higher than the overcoverage rate. That is, the number of people not included in the published census count $C$ is higher than the number of excess enumerations. That is generally the case for all Canadian censuses. For some domains of interest, however, negative net undercoverage is sometimes observed.

## 3.4 Evaluation

Two postcensal studies were carried out to estimate the 2016 Census population coverage error. The Reverse Record Check (RRC) provided estimates for population undercoverage, while the Census Overcoverage Study (COS) estimated population overcoverage. As previously mentioned, the Dwelling Classification Survey (DCS) does not contribute to census coverage error estimates since census counts are already adjusted to take DCS results into account.

The RRC and COS were conducted subsequent to field collection and census processing operations. Preliminary estimates of 2016 Census population coverage error were released on March 29, 2018. Following an in-depth validation exercise with the Demography Division and the provincial and territorial statistical focal points, final estimates were released on September 27, 2018. The data were released at the same time as the new official demographic estimates reflecting the update of the base population to the 2016 Census. Census population counts adjusted for net population undercoverage constituted the updated estimates of the base population.

A brief description of the methodology used in the two census coverage studies is presented below:

Reverse Record Check (RRC)

In the RRC, a random sample of individuals representing the 2016 Census target population was selected from frames independent of the census. These frames are described in Section 7.1. The 2016 RRC sample consisted of 67,872 persons in the provinces and 2,595 persons in the territories. The 2016 Census database was then searched to determine whether these persons had indeed been enumerated.

Where necessary, interviews were conducted, mostly via computer-assisted telephone interviewing (CATI) from the regional offices (ROs), to collect information for use in additional searches of the 2016 Census database. An interview was completed for 82.1% of the 15,584 cases sent to the ROs. The sampling weight was adjusted for non-response. Specifically, the total sampling weight of non-respondents was divided among groups of respondents most like the non-respondents in their response probability.

The estimate of population undercoverage is based on the number of persons in the RRC sample who were classified as “missed.” These persons were part of the target population for the 2016 Census, but no evidence of enumeration could be found in the 2016 Census Response Database. Nationally, 4,821 persons in the RRC sample were classified as missed in the provinces and 1,128 in the territories.

Census Overcoverage Study (COS)

Overcoverage was measured by matching the final 2016 Census database to itself, and then matching the final 2016 Census database and a list of persons who should have been enumerated according to administrative data sources. Probabilistic linkage was used for matching. Probabilistic linkage identifies matches that are close but not exact. A sample of potential duplicates was selected for each linkage, and demographic characteristics and names were examined to identify true cases of overcoverage.

