# Coverage Technical Report, Census of Population, 2016 8. Census Overcoverage Study (COS)

## 8.1 Overview and methodology

Prior to 2006, the level of overcoverage caused by duplication of individuals on the census was measured by three studies, each one covering part of the overcoverage: the Automated Match Study (AMS), the Collective Dwelling Study (CDS) and the Reverse Record Check (RRC). Since 2006, given and family names have been included in the Census Response Database,Note 1 and overcoverage is now measured with a single study: the Census Overcoverage Study (COS). Hence, the RRC is no longer used to measure overcoverage, and the CDS was discontinued. The AMS is still conducted for evaluation purposes.

As was the case with both the 2006 and 2011 overcoverage studies, the 2016 COS was based on a series of probabilistic record linkage operations and manual verification of pairs of potential overcoverage cases. These record linkage operations also involved the use of certain administrative data files.

For ease of reference, in the rest of this section, a pair of potential overcoverage cases is referred to as a pair, and a pair that has been confirmed to be the same person is referred to as a duplicate.

The 2016 COS was a statistical study in which overcoverage was estimated with a probabilistic sample selected from a frame of potential overcoverage cases. The COS involves all the steps one would find in a statistical survey:

• sampling frame construction
• sample selection
• data collection
• processing and verification of collected data
• weighting and estimation
• analysis.

However, the COS differs from a statistical survey in the following ways:

• The sampling frame was constructed by means of successive probabilistic and deterministic record linkage operations.
• Collection was based on manual verification of sampled pairs of records and did not involve respondents.

The COS methodology for estimating 2016 overcoverage was based on matching persons without geographic restrictions, while the AMSNote 2 was based on matching private households located in the same geographic area. The 2016 COS took advantage of the fact that the 2016 Census Response Database (RDB) contains respondents’ surnames and given names in two separate variables. This made it possible to produce a more precise estimate of the overcoverage caused by persons enumerated more than once on the census database. The COS used automated matching and manual verification methods. It included individuals living in collective dwellings and did not impose geographic restrictions such as those imposed on the AMS.

The census database used for the COS was the same as the database used for the RRC: the CCS-RDB. For simplicity, it is hereby referred to as the RDB.

As in 2006 and 2011, the 2016 COS sampling frame was constructed in multiple steps. The first step was an internal probabilistic record linkage in which the entire RDB was linked to itself. The second step was an external probabilistic record linkage in which the entire RDB was linked to an administrative frame (ADMIN) built from the Canadian Statistical Demographic Database (CSDD). The CSDD is an administrative database created from multiple administrative data sources for use in the Census Program.

The two probabilistic linkages were conducted with G-Link 3.2, the probabilistic record linkage system designed at Statistics Canada that uses the Fellegi-Sunter method to solve large file linkage problems when there are no direct identifiers common to both sources (Fellegi and Sunter 1969).

## 8.2 Construction of the sampling frame

The COS began with the construction of a sampling frame of potential overcoverage cases using probabilistic and deterministic record linkage. This work consisted of the following four steps:

1. probabilistic record linkage between the RDB and itself
3. extension of the sampling frame based on households
4. supplement frame: additional potential overcoverage identified during evaluation of CSDD prototype.

### 8.2.1 Input files for the construction of the COS sampling frame

The RDB contained a little under 34 million records and included responses from individuals living in both private and collective dwellings. It contained names (including given names and surnames), demography (including date of birth and sex) and geography (including province or territory and postal code). The ADMIN was extracted from the CSDD and contained over 48 million records. As in 2011, the ADMIN included names (given name and surname), demographic information (including date of birth and sex) and geographic information (including province or territory and postal code).

The CSDD from which the ADMIN is drawn was built using multiple administrative data files. These included tax files provided by the Canada Revenue Agency; immigrant and non-permanent resident files provided by Immigration, Refugees and Citizenship Canada; birth records from vital statistics files provided by Vital Statistics; the National Routing System; and the Indian Registry, provided by Indigenous and Northern Affairs Canada.

The records of persons classified as living in one of the three territories according to the CSDD were replaced by the information from territorial health care files used as a sampling frame for the RRC, as these were believed to provide better coverage of individuals living in the territories.

• names: given name and surname variables
• demographic data: date-of-birth and sex variables
• geographic data: province or territory and postal code variables.

### 8.2.2 Steps in constructing the COS frame

The COS frame included all pairs forming potential overcoverage cases that were outputted by G-Link following steps 1 and 2, and those identified by the extension. It also included record pairs identified by the supplement. Construction of the COS frame is illustrated in Figure 8.2.2 below:

Description for Figure 8.2.2

The title of Figure 8.2.2 is “Construction of the 2016 Census Overcoverage Study (COS) sampling frame.”

The first text box in this figure contains Step 1, probabilistic record linkage with G-Link for identifying (RDB-RDB) pairs (i.e., 2016 Census Response Database [RDB] and 2016 Census Response Database pairs) that might contain overcoverage.

This text box points to two types of pairs: potential pairs (linkage weight greater than or equal to the Step 1 threshold) and rejected pairs (linkage weight less than the Step 1 threshold).

Potential pairs are shown above the upper Step 1 threshold.

On the other side of the figure is a text box that contains Step 2, probabilistic record linkage with G-Link for identifying RDB individuals linked to the same administrative database (ADMIN) records.

This text box points to two types of pairs: potential pairs (linkage weight greater than or equal to the Step 2 threshold) and rejected pairs (linkage weight less than the Step 2 threshold).

Potential pairs are shown above the upper Step 2 threshold.

The left-hand side box that contains potential pairs from Step 1 and the right-hand side box that contains potential pairs from Step 2 both point to the next text box, which is in the middle of the figure and contains the following two bullets:

• Convert the (RDB-ADMIN) to (RDB-RDB) pairs.
• Merge Step 1 and Step 2.

An arrow points down to the next text box, which contains three bullets:

• Extract household pairs from steps 1 and 2.
• Step 3: Use deterministic criteria to capture additional pairs.
• Resolve overlaps and duplications.

Beneath that text box is another text box, which contains the following two bullets:

• Step 4: Identify pairs for supplement.
• Resolve overlaps and duplications.

Beneath that text box is an arrow that points to the final text box, 2016 COS Sampling Frame, which contains three bullets:

• Create record groups.
• Split large groups into neighbourhoods.
• Create sampling units.

Source: Statistics Canada, 2016 Census Overcoverage Study

### 8.2.3 Step 1: Probabilistic RDB-RDB linkage

The purpose of Step 1 was to measure overcoverage in persons on the RDB. To optimize the record linkage, provincial and territorial name frequency classes were created to compare the given names and surnames. The name frequency classes were used as outcome levels in the concordance rules for names in G-Link 3.2.

This linkage process was based on the following series of operations:

• Create RDB-RDB potential pairs by applying selection criteria.
• Compute name frequencies (for family names and given names) within each province and territory.
• Create name frequency classes.
• Compare the records for the potential pairs by applying concordance rules.
• Calculate the weights of the results of the rule application using the EM algorithm.
• Calculate provincial and territorial thresholds.
• Select pairs whose weight was greater than the thresholds.

From Step 1, a linked set of 4,254,360 potential pairs was created.

The purpose of Step 2 was to identify additional potential overcoverage not captured by the Step 1 linkage. RDB record pairs were created from groups of multiple RDB records linked to the same ADMIN record. This matching process identified pairs where the errors in the names or date of birth were such that the RDB-RDB linkage was not able to match them when they were compared directly. It also picked out pseudo-duplicates, which were RDB and ADMIN records that shared many match variables with a high linkage weight, but actually represented different individuals. For this reason, a clean-up of the RDB-RDB pairs derived from the RDB-ADMIN pairs was performed.

Step 2 involved the following sequence of operations:

• Create RDB-ADMIN potential pairs by applying selection criteria.
• Compute name frequencies within each province and territory.
• Create name frequency classes.
• Compare the records for the potential pairs by applying concordance rules.
• Calculate the weights of the results of the rule application using the EM algorithm.
• Calculate the provincial and territorial thresholds.
• Select pairs whose weight was greater than the threshold.
• Create RDB-RDB pairs from the set of RDB-ADMIN pairs above the thresholds.
• Verify and clean up the RDB-RDB pairs derived from RDB-ADMIN pairs to reduce the pseudo-duplicates.

### 8.2.5 Step 3: Extension of the sampling frame based on households

The purpose of extending the sampling frame was to find additional overcoverage in households that contained potential overcoverage cases from Step 1 or Step 2. This phase resulted in the creation of additional RDB-RDB record pairs, which were produced in two steps.

First, a household pair was produced for each RDB-RDB person pair created in Step 1 or Step 2 by adding the other household members to it. Second, using sex and date of birth as variables, new RDB-RDB pairs were identified by comparing the persons present in the household pair. Comparison rules were applied to identify pairs that might represent overcoverage cases. The frame extension included pairs from two private households, or pairs where an individual from a private household was linked to an individual from a collective. Pairs where both records were from collectives were excluded.

As was done with the RDB-ADMIN pairs, pairs already identified in the first two steps were removed. When the duplicates were removed, the extension frame captured a further 167,980 pairs, which were added to the set of potential pairs.

### 8.2.6 Creation of the sampling units

Potential overcoverage cases were identified using groups of interconnected RDB records. The RDB-RDB person pairs returned by steps 1 and 2, and the extension, were pooled. Mutually exclusive record groups were created from the set of unduplicated person pairs, meaning that a record group contained all the RDB records connected by potential overcoverage, as identified in steps 1 to 3. For cases where the record groups contained more than 10 RDB records, a graph theory method was applied to reduce the group into small subgroups called “neighbourhoods” to facilitate manual verification.

### 8.2.7 Step 4: Supplement frame

During the evaluation of the 2016 CSDD prototype, the CSDD team identified potential duplicates in the RDB. Their list of potential duplicates was compared with the COS sampling frame constructed from steps 1 to 3. Most of the potential pairs identified by the CSDD team had been captured in the first three steps described above. However, some of these pairs were found to be missing from the set of potential pairs created with the first three steps. An investigation of the missing pairs found a few additional pairs not identified by the CSDD team that should have been included in the set of potential pairs. A total of 97,000 pairs were identified in this manner and were added to the COS sampling frame.

The final COS sampling frame contained a little over 3.6 million sampling units, created from approximately 5.4 million RDB-RDB pairs.

## 8.3 Census Overcoverage Study sample design

To meet sampling requirements, the sampling frame units were separated into four strata:

• Stratum 1: This consists of the intraprovincial units from steps 1 to 3. These are the sampling units created during steps 1 to 3 where all the RDB records that formed the pairs in the unit were in the same province or territory.
• Stratum 2: This consists of the interprovincial pairs from steps 1 to 3. These are the sampling units created during steps 1 to 3 and formed from a single pair of records from two different provinces or territories.
• Stratum 3: This consists of the interprovincial groups and neighbourhoods from steps 1 to 3. These are the sampling units created during steps 1 to 3 and formed from at least two pairs of records, covering at least two different provinces or territories. It should be noted that such a unit could contain intraprovincial pairs. For example, a group could be composed of three pairs linking RDB records in Ontario, and another pair linking an Ontario record with an Alberta record.
• Stratum 4: This consists of the pairs from Step 4. These are pairs of potential duplicates determined by the supplement. This stratum included both intraprovincial and interprovincial pairs.

### 8.3.1 Sample allocation

The 2016 COS manual verification budget was based on a total sample size similar to the one used in 2011—i.e., a total sample of 54,000 pairs to verify for strata 1 to 3. It was decided that, on average, 70% of a province’s sample would be intraprovincial pairs, and the rest would be interprovincial pairs. This represented a reduction in the proportion of intraprovincial pairs in the sample (which was approximately 80% in 2011), given that the 2016 frame contained more interprovincial pairs.

The sample of pairs was divided between the provinces and territories using a power allocation method, with the total number of pairs in the frame for each province and territory as its size, and $x$ = 0.2 as its power. This made it possible to obtain precise estimates for each province, and nationally, while allowing for a sample distribution that was not too different from that of 2011. The next step involved distributing the provincial and territorial sample between the intraprovincial and interprovincial pairs. As mentioned, the ideal overall proportion was for 70% of each provincial sample to be intraprovincial pairs. A smaller proportion was used for Prince Edward Island and the three territories since the 2011 results showed that the contribution of interprovincial overcoverage was larger there. Inversely, a higher proportion was used for Quebec, Ontario and British Columbia. The same proportion was then used in the other provinces and was set to obtain the national proportion of 70%.

Once the sample size of intraprovincial and interprovincial pairs was determined for each province and territory, the next step was to divide these samples between the three strata. The sample of interprovincial pairs was separated between strata 2 and 3 based on the proportion of interprovincial pairs from the frame in each of the two strata. It was not possible to allocate the sample of intraprovincial pairs between strata 1 and 3 in the same manner. In fact, in stratum 3, the groups and neighbourhoods were composed mainly of interprovincial pairs, so selecting the desired number of intraprovincial pairs would have resulted in too many interprovincial pairs being selected. To avoid this problem, simulations showed that 85% of the sample of intraprovincial pairs had to be selected from stratum 1 to make the sample from stratum 3 usable. For Prince Edward Island and the three territories, it was practically a take-all. All the pairs from the stratum 1 frame were therefore selected there.

For stratum 4, the sample size was set at 3,000 pairs, based on the desired level of precision, available resources and the schedules to be met when the pairs from the supplement frame were determined.

### 8.3.2 Sample from stratum 1

Stratum 1 was separated into three substrata within each province: pairs, groups and neighbourhoods. An optimal sample allocation based on manual verification costs was used to allocate the provincial sample of this stratum to substrata. A systematic sample of pairs sorted by linkage weight class, age group, sex, marital status, mother tongue and CMA was then selected. For groups and neighbourhoods, a simple random sample was selected. Table 8.3.2 provides the distribution of sampling units from stratum 1 in the sampling frame and in the sample for each province and territory.

Table 8.3.2
Number of stratum 1 sampling units on the frame and in the sample, Canada, provinces and territories
Table summary
This table displays the results of Number of stratum 1 sampling units on the frame and in the sample. The information is grouped by Provinces and territories (appearing as row headers), Frame totals and Sample totals (appearing as column headers).
Provinces and territories Frame totals Sample totals
Pairs Groups Neighbourhoods Total Pairs Groups Neighbourhoods Total
Canada 1,165,224 192,530 172,610 1,530,364 30,481 1,702 1,420 33,603
Newfoundland and Labrador 9,133 296 159 9,588 2,386 96 74 2,556
Prince Edward Island 1,763 42 20 1,825 1,763 42 20 1,825
Nova Scotia 14,328 626 368 15,322 2,644 137 82 2,863
New Brunswick 13,844 591 309 14,744 2,604 129 60 2,793
Quebec 413,625 102,703 76,085 592,413 3,284 416 308 4,008
Ontario 479,444 71,845 82,517 633,806 3,947 295 392 4,634
Manitoba 18,440 835 512 19,787 2,724 107 76 2,907
Saskatchewan 17,175 833 415 18,423 2,532 118 53 2,703
Alberta 75,638 4,261 3,448 83,347 3,445 126 118 3,689
British Columbia 120,458 10,460 8,775 139,693 3,776 198 235 4,209
Yukon 570 16 2 588 570 16 2 588
Northwest Territories 345 8 0 353 345 8 0 353
Nunavut 461 14 0 475 461 14 0 475

### 8.3.3 Sample from stratum 2

Sampling units from stratum 2 were all interprovincial pairs. Stratum 2 was therefore subdivided based on the province of the pair’s two CCS-RDB records. The pairs for a combination of province 1 and province 2 were then sorted according to the same variables used in stratum 1, and a systematic sample was selected. Table 8.3.3 provides the distribution of the sampling units from stratum 2 in the sampling frame and the sample for each province and territory. It must be noted that because each pair belongs to two different provinces, the sum of the number of pairs in each province and territory corresponds to twice the number of pairs at the national level, since the pairs are counted in two places.

Table 8.3.3
Number of stratum 2 sampling units on the frame and in the sample, Canada, provinces and territories
Table summary
This table displays the results of Number of stratum 2 sampling units on the frame and in the sample. The information is grouped by Provinces and territories (appearing as row headers), Pairs on the frame and Pairs in the sample (appearing as column headers).
Provinces and territories Pairs on the frame Pairs in the sample
Prince Edward Island 15,546 561
Nova Scotia 93,331 695
New Brunswick 77,998 650
Quebec 316,352 490
Ontario 670,953 580
Manitoba 99,259 665
Alberta 293,933 495
British Columbia 348,355 500
Yukon 3,070 466
Northwest Territories 3,147 513
Nunavut 1,543 406

### 8.3.4 Sample from stratum 3

Stratum 3 was composed of interprovincial groups and neighbourhoods. As previously mentioned, a large number of these sampling units contained intraprovincial pairs. The challenge was therefore to select enough intraprovincial pairs, especially for the small provinces and the territories, without selecting too many interprovincial pairs or pairs of units in the large provinces. To do this, units from stratum 3 were divided in the following manner, using dominance rules based on the proportion of intraprovincial pairs belonging to the same province within interprovincial groups and neighbourhoods:

• units where at least 60% of the pairs were intraprovincial for a given province
• units where between 50% and 60% of the pairs were intraprovincial for a given province
• units where less than 50% of the pairs belonged to the same province.

A simulation was conducted to determine the sample size required in each substratum to obtain the targeted sample sizes for the number of intraprovincial and interprovincial pairs in each province. A simple random sample of groups and neighbourhoods was selected in each substratum. For each substratum, Table 8.3.4 presents the dominance rule applied (most frequent proportion and province), the number of sampling units (groups and neighbourhoods) in the substratum, the number of pairs making up these units, and the number of units sampled.

Substrata 1 to 11 were composed of sampling units where at least 60% of intraprovincial pairs were found in the same province or territory (there were none in Yukon or Nunavut). Substrata 13 to 16 were formed from units composed of interprovincial pairs only, and where the most frequent province or territory among all the CCS-RDB records in the unit were respectively Prince Edward Island (9911), Yukon (9960), the Northwest Territories (9961) or Nunavut (9962). Substratum 12 contained all the other units composed of interprovincial pairs only. Substrata 17 to 29 were composed of units where at least 50%, but less than 60%, of intraprovincial pairs came from a given province. Finally, substrata 30 to 42 included all the other sampling units, classified according to the most frequent province among all the intraprovincial pairs. For example, substratum 23 was composed of all units where at least 50%, but less than 60%, of intraprovincial pairs came from Manitoba.

Table 8.3.4
Number of sampling units and pairs on the frame and number of sampled units for each substratum in stratum 3
Table summary
This table displays the results of Number of sampling units and pairs on the frame and number of sampled units for each substratum in stratum 3. The information is grouped by Substratum (appearing as row headers), Province or territory, Number of sampling units on the frame, Number of pairs in the sampling units and Number of sampled units (appearing as column headers).
Substratum Province or territory Number of sampling units on the frame Number of pairs in the sampling units Number of sampled units
60% or more
1 10 98 374 98
2 11 16 56 16
3 12 261 982 168
4 13 234 1,008 139
5 24 35,259 157,968 237
6 35 56,977 250,169 236
7 46 208 837 148
8 47 168 762 117
9 48 2,108 8,465 185
10 59 8,986 37,667 214
11 61 1 3 1
12 9900 424,804 971,198 275
13 9911 10,664 28,813 81
14 9960 1,974 5,310 81
15 9961 1,988 5,329 81
16 9962 681 1,911 81
At least 50% but less than 60%
17 10 1,057 2,314 152
18 11 174 381 102
19 12 2,175 4,773 60
20 13 1,614 3,667 60
21 24 44,438 114,348 78
22 35 112,607 301,663 97
23 46 1,731 3,790 95
24 47 1,262 2,784 95
25 48 12,287 28,069 129
26 59 27,016 69,808 119
27 60 22 44 22
28 61 8 16 8
29 62 5 10 5
Less than 50%
30 10 2,335 9,959 64
31 11 378 1,544 75
32 12 4,461 18,969 44
33 13 3,783 15,606 44
34 24 38,148 162,831 30
35 35 96,087 422,284 30
36 46 3,903 17,103 44
37 47 3,089 13,033 44
38 48 22,495 101,325 30
39 59 73,556 347,254 30
40 60 80 322 80
41 61 55 230 55
42 62 16 67 16

### 8.3.5 Sample from stratum 4

Stratum 4 contained all the record pairs determined using the supplement. Since stratum 4 was added after the groups and neighbourhoods formed by the interconnected pairs from steps 1 to 3 were created, all the sampling units from stratum 4 were treated as pairs. This stratum was substratified by province. A power allocation was used to allocate the sample of 3,000 pairs between the provinces and territories. Also, an optimal allocation, based on the expected overcoverage rates, made it possible to allocate one province’s sample between the intraprovincial and interprovincial pairs. A systematic sample of pairs from the RDB was then chosen in each substratum. For each province and territory, Table 8.3.5 presents the total number of intraprovincial and interprovincial pairs in the frame, and the number of pairs sampled. As mentioned, it is important to note that, for interprovincial pairs, the national-level total corresponds to half the sum of the numbers from each province and territory since each interprovincial pair is counted in total in two provinces.

Table 8.3.5
Number of intraprovincial and interprovincial sampling units (pairs) on the frame and in the sample, Canada, provinces and territories
Table summary
This table displays the results of Number of intraprovincial and interprovincial sampling units (pairs) on the frame and in the sample. The information is grouped by Provinces and territories (appearing as row headers), Pairs on the frame and Pairs in the sample (appearing as column headers).
Provinces and territories Pairs on the frame Pairs in the sample
Intraprovincial Interprovincial Intraprovincial Interprovincial
Newfoundland and Labrador 967 935 144 70
Prince Edward Island 221 314 87 61
Nova Scotia 1,608 1,836 163 93
New Brunswick 1,314 1,377 156 81
Quebec 15,803 5,922 367 95
Ontario 31,264 13,927 444 128
Manitoba 2,109 2,427 177 101
Alberta 7,831 6,982 272 122
British Columbia 11,763 7,979 319 116
Yukon 78 81 78 34
Northwest Territories 56 113 56 53
Nunavut 87 113 87 43

## 8.4 Collection

The collection process involved manually verifying the samples of selected pair groups. Manual verification was done pair by pair. When a group or neighbourhood was sampled, all the pairs it contained were examined manually. The pairs were examined only once, even if they belonged to more than one sampled neighbourhood.

The manual verification process involved an exhaustive review of all of the information available on the CCS-RDB. As in 2011, it included the following steps:

1. comparing persons sampled from the CCS-RDB by name, sex, birth date and relationships between persons
2. comparing members of households in the CCS-RDB according to the same criteria
3. evaluating evidence for or against overcoverage between the two persons in the pair to determine whether the two records actually represent the same person
4. determining the overcoverage scenario, coded only when there was verified overcoverage between non-identical households. Overcoverage scenarios are provided in Table 8.4.
Table 8.4
Overcoverage scenario codes and their meaning
Table summary
This table displays the results of Overcoverage scenario codes and their meaning. The information is grouped by Code (appearing as row headers), Overcoverage scenario between non-identical households (appearing as column headers).
Code Overcoverage scenario between non-identical households
1.1 Student or young adult newly away from home
1.2 Young adult newly away from home because of marriage or common-law relationship
1.3 Adult entering into or leaving married or common-law relationship
2.1 Child or children of parents in separate households
2.2 Child or children with two relatives or adults
4.1 Collective dwelling
5.1 Other

The manual verification sample was divided into batches of 150 pairs. One batch was entirely verified by the same coder. Once a batch was verified, it was evaluated using a statistical quality control operation.

## 8.5 Weighting and estimation

The starting weight of a sampling unit was simply the inverse of its probability of being selected. Given that the sampling units composed of groups or neighbourhoods varied with respect to the number of pairs they contained, a calibration step was added to ensure a good representation of the number of pairs in each province and territory. In stratum 1 (intraprovincial units), the sampling weights were calibrated so the estimated number of pairs corresponded to the total number of pairs in the frame for each province and territory. In stratum 3 (interprovincial groups and neighbourhoods), the sampling weights were calibrated so the estimated number of intraprovincial and interprovincial pairs in each province and territory was equal to the corresponding total from the frame. Thus, 13 control totals were used for stratum 1, compared with 26 for stratum 3. Statistics Canada’s Generalized Estimation System (G-Est) was used for the calibration. No calibration was required for strata 2 and 4, since they contained only pairs. Table 8.5a presents the three calibration factors for each province and territory.

Table 8.5a
Average calibration factor (ratio of frame total to weighted estimate) by stratum and type of pair, provinces and territories
Table summary
This table displays the results of Average calibration factor (ratio of frame total to weighted estimate) by stratum and type of pair. The information is grouped by Provinces and territories (appearing as row headers), Stratum 1 and Stratum 3 (appearing as column headers).
Provinces and territories Stratum 1 Stratum 3
Intraprovincial Interprovincial
Newfoundland and Labrador 1.003 1.049 0.899
Prince Edward Island 1.000 1.057 1.136
Nova Scotia 0.998 1.071 0.945
New Brunswick 0.996 0.901 1.225
Quebec 1.013 0.988 0.942
Ontario 1.014 1.001 1.008
Manitoba 1.008 1.013 1.117
Alberta 1.000 0.971 0.997
British Columbia 0.999 0.956 0.951
Yukon 1.000 1.039 1.415
Northwest Territories 1.000 1.015 1.468
Nunavut 1.000 1.000 0.493

The results of the manual verification were processed to create overcoverage groups for estimation. Overcoverage groups were formed from all the RDB records linked by verified overcoverage. The COS estimates were obtained by adding the estimated overcoverage counted in each overcoverage group. For an overcoverage group that was a simple pair, the overcoverage count was 1. If the overcoverage group was contained in a small group of records (i.e., a group that had not been split into neighbourhoods), then the following formula was applied:

Overcoverage = number of records in the overcoverage group - 1.

For overcoverage groups split into neighbourhoods, overcoverage was counted according to the following two steps:

1. calculating the overcoverage in each neighbourhood for which the anchor (i.e., the CCS-RDB record serving as the centre of the neighbourhood) was involved in verified overcoverage cases for this overcoverage group, as follows:
2. Overcoverage in the neighbourhood =
1. adding the overcoverage of each neighbourhood to obtain the total overcoverage of the group.

Overcoverage for a domain was obtained by multiplying the total overcoverage of the pair, group or neighbourhood by the proportion of CCS-RDB records that were part of the domain among those that belonged to the overcoverage group.

In all the cases described above, the overcoverage calculated for a unit was multiplied by the sampling unit’s post-calibration weight to obtain the weighted estimation. The variance was estimated with G-Est, which uses Taylor linearization.

Similar to what was done in the 2006 and 2011 COS, an adjustment based on the AMS was applied to the COS estimates to take into account the overcoverage measured by the AMS outside the COS universe. The adjustment factor was calculated and applied separately for each province and territory. The final variance took this adjustment into account. Table 8.5b shows the adjustment factor applied for each province and territory, for the last three iterations of the COS.

Table 8.5b
AMS adjustment to the 2006, 2011 and 2016 COS estimates, provinces and territories
Table summary
This table displays the results of AMS adjustment to the 2006. The information is grouped by Provinces and territories (appearing as row headers), 2006, 2011 and 2016 (appearing as column headers).
Provinces and territories 2006 2011 2016
Newfoundland and Labrador 1.029 1.030 1.035
Prince Edward Island 1.028 1.017 1.003
Nova Scotia 1.034 1.021 1.012
New Brunswick 1.040 1.071 1.037
Quebec 1.026 1.016 1.008
Ontario 1.040 1.020 1.015
Manitoba 1.035 1.025 1.010
Alberta 1.058 1.018 1.026
British Columbia 1.037 1.017 1.029
Yukon 1.027 1.014 1.010
Northwest Territories 1.039 1.169 1.015
Nunavut 1.061 1.015 1.023

## 8.6 Final results

### 8.6.1 Overcoverage by step

The contribution of each step, and of the AMS adjustment, to the 2016 COS overcoverage estimates is provided in Table 8.6.1.

Table 8.6.1
2016 Census Overcoverage Study estimated overcoverage by step, Canada, provinces and territories
Table summary
This table displays the results of 2016 Census Overcoverage Study estimated overcoverage by step. The information is grouped by Provinces and territories (appearing as row headers), Step 1, Step 2, Step 3, Step 4 and AMS adjustment, calculated using estimated number and percentage of total units of measure (appearing as column headers).
Provinces and territories Step 1 Step 2 Step 3 Step 4 AMS adjustment
Estimated number Percentage of total Estimated number Percentage of total Estimated number Percentage of total Estimated number Percentage of total Estimated number Percentage of total
Canada 602,995 85.25 1,759 0.25 36,229 5.12 54,845 7.75 11,508 1.63
Newfoundland and Labrador 9,345 84.38 47 0.42 486 4.39 823 7.43 374 3.38
Prince Edward Island 2,082 86.35 9 0.37 98 4.06 214 8.88 8 0.33
Nova Scotia 14,375 84.25 105 0.62 788 4.62 1,599 9.37 196 1.15
New Brunswick 14,334 86.10 42 0.25 591 3.55 1,095 6.58 586 3.52
Quebec 159,996 90.97 367 0.21 5,619 3.19 8,519 4.84 1,385 0.79
Ontario 217,852 84.02 362 0.14 14,591 5.63 22,759 8.78 3,724 1.44
Manitoba 16,975 85.53 60 0.30 1,032 5.20 1,585 7.99 195 0.98
Saskatchewan 17,235 79.60 82 0.38 2,626 12.13 1,586 7.33 122 0.56
Alberta 60,144 82.69 253 0.35 3,853 5.30 6,637 9.12 1,851 2.54
British Columbia 89,000 81.89 418 0.38 6,395 5.88 9,833 9.05 3,036 2.79
Yukon 692 81.51 6 0.71 59 6.95 84 9.89 8 0.94
Northwest Territories 467 83.24 4 0.71 26 4.63 56 9.98 8 1.43
Nunavut 497 78.02 4 0.63 65 10.20 57 8.95 14 2.20

Approximately 85% of the overcoverage was estimated based on possible duplicates identified in Step 1. Because the portion of the frame from Step 1 (linkage of the CCS-RDB to itself) was expanded, the identification of potential RDB duplicates through the linkage with an intermediate administrative file (Step 2) did not contribute much to the overall estimate of census overcoverage. Remember that the potential duplicates identified both in Step 1 and in Step 2 were included in the list from Step 1. The extension (Step 3) added 36,229 persons to the estimated overcoverage. This represents 6.0% of the cases identified in steps 1 and 2 combined, which is consistent with what was observed in 2011, where the extension represented 5.4% of the total estimate from steps 1 and 2. The supplement (Step 4) identified 54,845 overcovered persons. This represents 56.5% of the 97,086 pairs added to the COS sampling frame by the supplement. This proportion is very high, but that was to be expected given that the goal of the supplement was to add pairs that had been flagged as likely to be RDB duplicates but that were missing from steps 1 to 3. The AMS adjustment added 11,508 persons to the overcoverage estimate, which represents 1.6% of the final estimate.

Step 1 contributed less to the total provincial or territorial estimate for Nunavut (78.0%) and Saskatchewan (79.6%), where the contribution from the extension was higher than elsewhere. However, the contribution from Step 1 was higher for Quebec (91.0%), where the contributions from the extension and the supplement were the lowest among all jurisdictions.

### 8.6.2 Distribution of overcoverage by scenario

The overcoverage results by scenario are presented in Table 8.6.2. The overcoverage scenario codes were provided in Section 8.4. As noted previously, the overcoverage scenario was coded only when there was overcoverage between non-identical households.

Table 8.6.2
Distribution of 2016 Census Overcoverage Study estimated overcoverage by scenario, Canada, provinces and territories
Table summary
This table displays the results of Distribution of 2016 Census Overcoverage Study estimated overcoverage by scenario. The information is grouped by Provinces and territories (appearing as row headers), Overcoverage scenario, 1.1, 1.2, 1.3, 2.1, 2.2, 3.1, 3.2, 4.1, 5.1, Student or young adult newly away from home, Young adult newly away from home because of marriage or common-law relationship, Adult entering into or leaving married or common-law relationship, Child or children of parents in separate households, Child or children with two relatives or adults, Adult with other relatives, Adult with other unrelated adults, Collective dwelling and Other, calculated using percent units of measure (appearing as column headers).
Provinces and territories Overcoverage scenario
1.1 1.2 1.3 2.1 2.2 3.1 3.2 4.1 5.1
Student or young adult newly away from home Young adult newly away from home because of marriage or common-law relationship Adult entering into or leaving married or common-law relationship Child or children of parents in separate households Child or children with two relatives or adults Adult with other relatives Adult with other unrelated adults Collective dwelling Other
percent
Canada 18.8 4.3 3.8 42.4 3.9 10.1 4.0 8.1 4.6
Newfoundland and Labrador 26.8 3.6 5.5 39.4 6.0 5.9 3.6 4.8 4.4
Prince Edward Island 25.2 3.7 2.8 48.4 5.9 4.1 2.8 4.4 2.7
Nova Scotia 25.3 5.3 4.2 42.8 3.1 4.3 2.0 7.5 5.5
New Brunswick 24.0 4.7 4.9 37.8 5.7 6.7 1.8 9.2 5.2
Quebec 14.8 7.1 1.9 56.0 1.7 3.9 2.5 8.2 3.9
Ontario 22.1 3.3 5.1 33.1 5.3 15.8 3.8 7.0 4.4
Manitoba 16.9 2.6 5.1 40.8 5.4 8.8 3.7 12.6 4.2
Saskatchewan 16.5 0.8 4.6 51.1 5.4 5.8 3.4 9.0 3.3
Alberta 17.8 3.2 2.7 39.8 5.0 13.6 7.0 6.5 4.4
British Columbia 17.5 1.6 5.3 36.1 2.5 9.7 7.5 12.0 7.8
Yukon 10.3 1.9 2.5 42.4 4.5 16.0 3.2 6.9 12.2
Northwest Territories 14.0 5.2 2.9 28.7 9.2 11.6 6.3 7.2 14.9
Nunavut 9.0 6.9 7.3 16.8 22.8 17.0 2.9 6.4 11.0

As in 2011, the “one or more children of parents in separate households” scenario, where two separated or divorced parents both record their children in their respective survey questionnaires, was observed the most often. This was the case in every province and territory except Nunavut. The “other” category was used less often in 2016 compared with five years earlier. In 2011, “other” was the most frequent scenario for Nunavut and British Columbia. In 2016, British Columbia still had the highest proportion of “other” among the 10 provinces, but the rate was one-third of what it was in 2011.

At the national level, the two most frequent other scenarios remained “student or young adult who recently left the family home” and “adult with other relatives.” There were more cases of overcoverage between a private dwelling and a collective dwelling (scenario 4.1) in 2016 compared with 2011 (8.1% vs. 2.5%). In 2016, the extension was also applied to the comparison of households where one member included in a pair identified in step 1 or 2 lived in a private dwelling and where the other member lived in a collective household, whereas in 2011, the extension was limited to the comparison of private dwellings. However, this change explains only a small portion of the increase. Many of the cases observed in 2011 were likely coded in the “other” category, but confirming this hypothesis would require a return to data from the 2011 COS manual verification, which was not done. The same reasoning could be applied to the proportion of children of parents in separate households: there was not such a large increase in the number of separated parents between 2011 and 2016; further investigation would be required to determine how much of this change was from coding and how much was from a real change in the behaviour of separated parents enumerating their children.

Date modified: