Statistics Canada
Symbol of the Government of Canada
Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

5. Data quality measurement

5.1 General

5.2 Sources of error and evaluation studies

5.3 Data quality and data suppression

5.3.1 Random rounding and area suppression

5.3.2 Incompletely enumerated areas

5.3.3 Data quality index and global non-response

5.3.4 Citizenship and immigration data

5.3.5 Suppression of income data

5.3.6 Data quality and 'on reserve' communities

5.3.6.1 Population and dwelling counts

5.3.6.2 Census data from the core (2A short form) and detailed (2B/2D long form) questions

5.4 Coverage

5.4.1 Coverage error for participating reserves

5.4.2 Coverage error for incompletely enumerated reserves and settlements

5.1  General

The 2006 Census was a large and complex undertaking and, while considerable effort was taken to ensure high standards throughout all collection and processing operations, the resulting Census estimates are inevitably subject to a certain degree of error. Users of census data should be aware that such error exists, and should have some appreciation of its main components, so that they can assess the usefulness of census data for their own purposes.

5.2 Sources of error and evaluation studies

For census data in general, the principal types of error are as follows: coverage errors, non-response errors, processing errors, sampling errors.

Coverage errors occur when dwellings or individuals are missed, incorrectly enumerated or counted more than once.

Non-response errors result when responses cannot be obtained from a certain number of households and/or individuals, because of extended absence or some other reason or when responses cannot be obtained from a certain number of questions in a complete questionnaire.

Response errors occur when the respondent, or sometimes the census representative, misunderstands a census question, and records an incorrect response or simply uses the wrong response box.

Processing errors can occur at various steps including coding, when 'write-in' responses are transformed into numerical codes; data capture, when responses are transferred from the census questionnaire to an electronic format, by optical character recognition methods or key-entry operators; and imputation, when a 'valid', but not necessarily correct, response is inserted into a record by the computer to replace missing or 'invalid' data ('valid' and 'invalid' referring to whether or not the response is consistent with other information on the record).

Sampling errors apply only to the supplementary questions on the 'long form' asked of a one-fifth sample of households, and arise from the fact that the responses to these questions, when weighted up to represent the whole population, inevitably differ somewhat from the responses which would have been obtained if these questions had been asked of all households. (This issue of sampling errors would not apply in the case of reserves, and remote and northern communities as all households receive the 'long form' [2D] questionnaire.)

For more information on data quality and sources of error, see: Appendix B Data quality, sampling and weighting, confidentiality and random rounding.

For more information on data quality verification in place for the 2006 Census, see: Data quality verification in place for the 2006 Census.

5.3 Data quality and data suppression

Data disseminated by the census are subjected to a variety of automated and manual processes to determine whether the data needs to be suppressed. This is done primarily for two reasons: (1) to ensure non-disclosure of individual respondent identity and characteristics ('confidentiality') and (2) to limit the dissemination of data of unacceptable quality (data quality).

5.3.1 Random rounding and area suppression

Random rounding is used to prevent the possibility of associating statistical data with any identifiable individual. Under this method, all figures, including totals and margins, are randomly rounded either up or down to a multiple of '5', and in some cases '10.'

For 2A (100%) data, all counts are rounded to a base of 5. This means that all 2A counts will end in either 0 or 5. The random rounding algorithm employed controls the results and rounds the unit value of the count according to a pre-determined frequency. The table below shows those frequencies. Note that counts ending in 0 or 5 are not changed and remain as 0 or 5.

2B (20%) data require a slightly different random rounding algorithm. All counts greater than 10 are rounded to base 5, as is done for 2A data. Counts less than 10 are rounded to base 10. This means that any 2B counts less than 10 will always be changed to 0 or 10. The table below shows the effect of rounding on 2B counts with a value less than 10.

In addition to random rounding, area suppression has been adopted to further protect the confidentiality of individual responses.

Area suppression is the deletion of all characteristic data for geographic areas with populations below a specified size.

The specified population size for all standard areas or aggregations of standard areas is 40, except for blocks, block-faces or postal codes. (A block-face is generally one side of a city street between two consecutive intersections; it is also the smallest geographic unit available from Statistics Canada.) Consequently, no characteristics or tabulated data are to be released for areas below a population size of 40, but are included at higher levels of geography. The population and dwelling counts are not suppressed for these small areas.

The suppression is implemented for all products involving subprovincial data (i.e., Profile series, basic summary tabulations, semi-custom and custom data products) collected on a 100% or 20% sample basis.

In all cases, suppressed data are included in the appropriate higher aggregate subtotals or totals.

5.3.2 Incompletely enumerated areas

Some Indian reserves and settlements did not participate in the census as enumeration was not permitted, or it was interrupted before completion. In 2006, there were 22 incompletely enumerated reserves, down from 30 in 2001 and 77 in 1996. Data quality rules require these non-enumerated areas to be identified and excluded from products. The data are not available for these reserves.

For more information see: Incompletely enumerated Indian reserves and Indian settlements

5.3.3  Data quality index and global non-response

A data quality index based on a global non response rate is calculated for all CSDs to reflect the level of quality. The global non-response rate is the percentage of required responses left unanswered by the respondents.

Global response rates are determined for each of the census geographic areas. These areas are flagged on the database according to the non-response rate. Geographic areas with a non-response rate higher than or equal to 25% are suppressed from tabulations. Geographic areas with a global non-response rate higher than or equal to 5% and lower than 25% are not suppressed and are broken into two categories and are flagged according to the following ranges: falling between 5% and 10% and falling between 10% and 25%.

Tabulations for which the data quality index is greater than or equal to 5% but less than 25% are to be used with caution. Only population and dwelling counts are released for the geographical areas (CSDs) for which the data quality index is greater than or equal to 25%. However, the additional census characteristics data are included in all higher geographic level tabulations. This means that while census characteristics are not published for these individual CSDs, they would be counted in the estimates at the provincial or national level.

Geographical areas with no data quality index shown have a global non-response rate lower than 5%.

For more information on data quality indicators see: Data quality indicators for place of residence - 2006 Census and Data quality and confidentiality standards and guidelines (public): Appendix B.

5.3.4  Citizenship and immigration data

Persons living on Indian reserves and Indian settlements who were enumerated with the 2006 Census Form 2D questionnaire were not asked the questions on citizenship and immigration. Consequently, the citizenship and immigration data are not available for Indian reserves and  settlements.

For more information on Indian reserves and Indian settlements for which citizenship, landed immigrant status and period of immigration data are suppressed see: Data quality and confidentiality standards and guidelines (public): Data suppression - Indian reserves.

5.3.5 Suppression of income data

Income distributions and related statistics are suppressed if the non-institutional population in the area population from either the 100% or 20% databases is less than 250 persons or if the number of private households is less than 40.

5.3.6 Data quality and 'on reserve' communities

In general, the Census has been able to obtain accurate population and dwelling counts and to maintain high response rates to census questions for all geographic areas in Canada. However, some census subdivisions (CSDs), including some Indian reserves, have high non-response rates for some census questions despite the fact that accurate population and dwelling counts were obtained.

5.3.6.1 Population and dwelling counts

Population and dwelling counts are available at all geographic levels except for the incompletely enumerated Indian reserves and settlements (22). The population and dwelling counts are available for all participating 'on reserve' communities.

The availability of population and dwelling counts applies to 'on reserve' communities for which other census characteristics data may be suppressed such as:

  • the 'on reserve' communities (census subdivisions) for which the data quality index is greater than or equal to 25%
  • the 'on reserve' communities where the population size is less than 40.

5.3.6.2 Census data from the core (2A short form) and detailed (2B/2D long form) questions

The global non-response is calculated for both the core set of questions (2A questions 1 to 8) and the remainder of the questions that comprise the long census questionnaire (2B/2D). The data quality index may be different for the two sets of questions.

For example, the community may have a global non-response rate lower than 5% for the core set of 2A questions and a global non-response rate 25% or higher for the detailed, larger set of 2B/2D questions. In this example, the core 2A data would be available for the community. As mentioned previously in section 5.3.3, the additional 2B/2D data would only be included in all higher geographic level (provincial and national) tabulations.

Core questions – (2A short census form)

The core set of 2A questions includes eight questions on basic topics such as relationship to Person 1, age, sex, marital status, and mother tongue.

Detailed questions – (2B/2D long census form)

The additional questions found on the 2B/2D questionnaire include 45 questions on topics such as Aboriginal identity, Aboriginal ancestry (ethnic origin) education, mobility, income and employment.

Data availability for 'on reserve' communities

The tables which follow summarize and show the number of communities defined as 'on reserve' for which data are available.

Different levels of data availability are shown, ranging from the full set of data being available (population and dwelling counts, 2A and 2B/2D data) to no data for the incompletely enumerated.

For more information on census subdivision types and selected census subdivisions associated with 'on reserve' population, see: Census subdivision (CSD).

For more information on census subdivisions (communities) for which 2A data or 2B/2D data are suppressed see: Census subdivision suppression list with names - 20% sample data and 100% data.

Data availability for 'on reserve' communities, by region, 2006 Census

In 2006, data availability for 'on reserve' communities varied across the different regions.

The incompletely enumerated reserves were concentrated in Ontario (10), and Quebec (7), and another 3 were in Alberta, and one each in Saskatchewan and British Columbia.

In the Atlantic provinces, there was full data (population count, 2A core data and 2D detailed characteristics data) for the majority of 'on reserve' communities.

  • In Newfoundland and Labrador there was full data for the two reserves in the province.
  • In Prince Edward Island there was full data for the 3 reserves which had a population size greater than 40.
  • In Nova Scotia, there were 16 reserves with population size greater than 40. Fifteen of them had full data and one had population and dwelling counts only available.
  • In New Brunswick there were 17 reserves with population size greater than 40. Fifteen of them had full data, and two reserves had partial data.

In Quebec, there were 33 reserve communities with population size greater than 40. Full data was available for 32 of them, and one reserve had partial data available. There were 7 reserve communities which were incompletely enumerated (Gesgapegiag, Doncaster, Kanesatake, Kahnawake, Akwesasne, Lac-Rapide, Wendake).

In Ontario, there were 114 reserve communities with population size greater than 40 and there was full data available for 104 of them, and partial data was available for 10. There were 10 reserves which were incompletely enumerated (Fort Severn, Attawapiskat 91A, Factory Island 1, Bear Island 1,Tyendinaga Mohawk Territory, Wahta Mohawk Territory, Six Nations (Part) 40,
Six Nations (Part) 40, Oneida 41, Akwesasne (Part) 59).

In the Prairies there was full data (population count, 2A core data and 2D detailed characteristics data) for the majority of reserves with population size greater than 40.

  • In Manitoba there was full data for 65 of 69 reserves with population size greater than 40. There were 4 reserves with a population greater than 40, which had partial data (population and dwelling counts, 2A core data).
  • In Saskatchewan there was full data for 80 of 90 reserves with population size greater than 40. There were 10 reserves which had partial data available. Another 18 reserves were small communities with a population size of less than 40. There was one reserve which was incompletely enumerated (Big Island Lake Cree Territory).
  • In Alberta there was full data for 52 of 63 reserves with population size of greater than 40. There were 11 reserves which had partial data. There were 6 reserves with a population size of less than 40, and there were 3 reserves that were incompletely enumerated (little Buffalo, Saddle Lake 125, Tsuu T'ina Nation 145).

In British Columbia, there was full data for 171 of 238 reserves with population size greater than 40. Sixty-seven reserves had partial data available. Another 132 reserves were small communities with a population size of less than 40. And finally, there was 1 incompletely enumerated reserve in the province (Esquimalt).

In the Yukon and the Northwest Territories, there was full data for the majority of First Nations communities included in the definition of 'on reserve' communities (see section 1.3).

Improvements for fuller participation and higher response rates for the 'on reserve' communities can be targeted for the 2011 Census for specific communities.

The comparison of data availability across the 3 census years should be done with caution. There have been changes in methodology over time which could affect the calculation of global non-response rates and would limit their comparability over time.

Consideration of additional factors and analysis are needed for a fuller understanding of data availability and quality.

The preliminary analysis shows that overall, when all groupings of data availability are considered, the data available for 'on reserve' communities have increased over time. The proportion of 'on reserve' communities for which full or partial data are available across all groupings has increased from 90.9% (1996) to 97.5% (2006). Most of the increase (5.6%) occurred from 1996 to 2001, and there was a small increase (1%) from 2001 to 2006.

As was noted earlier, there were 22 incompletely enumerated reserves in 2006, down from 30 in 2001 and 77 in 1996.The increase in the data availability, considering all groupings from 1996 to 2001 may be the result of the reduction in the number of incompletely enumerated reserves. In other words, the increase in the reserves for which there are full/partial data could be from reserves where there was previously no data. The proportion of reserve communities for which no data are available (incompletely enumerated reserves) decreased from 9.0% in 1996 to 2.5% in 2006. For a fuller understanding of the increases and decreases, it is important to examine more closely the shifts in data availability for reserves over time.

The analysis also shows the following for the different data availability groupings:

The proportion of reserves for which the complete set of data are available has gone from 65.3% in 1996 to 67.8% in 2001, and to 64.8% in 2006.

The proportion of reserve communities with a population size greater than 40 for which there are only population counts available increased from 1.2% in 1996 to 7.7% in 2006.

There was an increase from 1996 to 2006 in the two categories of 2A data and 2D data availability. There was a decrease from 1996 to 2006 in the proportion of small reserves with a population size less than 40 for which only population counts are available, from 24.3% in 1996 to 20.7% in 2006.

As noted above, a closer examination will shed light on the shifts over time. Regional analysis and analysis of the size of these communities will also be examined. While the overall data availability for reserve communities has increased over time, the improvement of data quality for individual communities remains a key objective of the 2011 Census. The analysis will be used to help inform efforts to improve data quality.

Data availability status of incompletely enumerated reserves over time, 1996, 2001 and 2006

In 1996, there were 77 incompletely enumerated reserves. This table shows the status of these reserves in 2001 and 2006.

Of the 77 reserves that were incompletely enumerated in 1996, full data were available for 35 in 2001 and for 43 in 2006. Partial data (population and dwelling counts, core data only) were available for 2 reserves in 2001 and population and dwelling counts were available for 8 reserves with a population size greater than 40 in both 2001 and 2006.

In 2001, 6 of these reserves had a population size of under 40, compared to 8 in 2006.
Many of the reserves that were incompletely enumerated in 1996 were still incompletely enumerated in 2001 (21 reserves) and in 2006 (14 reserves).

There were 13 reserves that were incompletely enumerated in all three census years:
Akwesasne (formerly Akwesasne (Partie)), Quebec
Kahnawake, Quebec     
Kanesatake, Quebec
Akwesasne (Part) 59, Ontario
Oneida 41, Ontario
Six Nations (Part) 40, Ontario
Six Nations (Part) 40, Ontario
Tyendinaga Mohawk Territory, Ontario
Wahata Mohawk Territory, Ontario
Big Head Island Lake Cree Territory, Saskatchewan
Little Buffalo, Alberta
Saddle Lake 125, Alberta
Esquimalt, British Columbia

In 2001, there were 30 incompletely enumerated reserves. This table shows the status of these reserves in 2006.

The status for just over half (16) of the 30 reserves that were incompletely enumerated in 2001 did not change in 2006. There was full data available for 10 of these reserves and partial data for 3 reserves.

Data availability status of 2006 'on reserve' communities with data available for population and dwelling counts only, in 2001 and 1996

In 2006, there were 68 on-reserve communities with a population size greater than 40 for which only population and dwelling counts were available. This table shows the status of these communities in 1996 and 2001.

Most of the 68 'on reserve' communities had full data available in 2001 and in 1996. A small number were incompletely enumerated reserves in 2001 (3) and 1996 (8).

Improvements can be targeted for these communities to gain back fuller participation in the census and to increase response rates.

Data availability status of 2006 'on reserve' communities with data available for 2A core data only, in 1996 and 2001

In 2006, there were 34 'on reserve' communities for which only population and dwelling counts and 2A core data were available. This group of communities includes those with a population size greater than 40. This table shows the status of these communities in 2001 and 1996.

Most of the 34 'on reserve' communities had full data available in 2001 and in 1996.

Improvements can be targeted for these communities gain back fuller participation in the census and to increase response rates.

In 2006, there was an increase in the number of 'on reserve' communities with partial data.

The improvement of data quality for individual communities remains a key objective of the 2011 Census. The analysis will be used to help inform efforts to improve data collection activities, data quality and data available at the community level.

Most of the 575 'on reserve' communities that had full data in 2006 also had full data available in 2001 (490) and in 1996 (448).

Partial data (population and dwelling counts, core data only) were available for 20 reserves in 2001 and population and dwelling counts were available for 27 reserves with a population size greater than 40. In 1996, partial data were available for 2 reserves and population and dwelling counts were available for 8 reserves.

In 2001, 17 reserves had a population size of under 40, compared to 19 in 1996.

5.4 Coverage

Throughout the census-taking process, every effort is made to ensure high-quality results. The resulting data, however, are subject to a certain degree of inaccuracy. One inaccuracy is population coverage error, the extent to which census data excludes persons who should have been enumerated and includes persons who were enumerated more than once. It is the net of the these two errors, net population undercoverage that quantifies the net number of persons missed by the census. The Census Data Quality Measurement Program provides users with information on population coverage error.

Coverage error generally occurs during the field collection stage. Undercoverage, for example, results when someone is not listed on the census questionnaire as a usual resident of the dwelling even though the census rules on whom to include and exclude indicate that they should be listed. An example of overcoverage is when children whose parents live in separate households are listed twice, once with each parent. Some living arrangements could result in either undercoverage or overcoverage. Someone, for example, whose employment requires them to live away from their family for a period of time, is at risk of both undercoverage and overcoverage.

It should be noted that persons living in dwellings for which a census questionnaire was never received or in dwellings that were erroneously classified as unoccupied are not examples of undercoverage. This is because census processing includes a step whereby a statistical imputation is performed to estimate the number of persons living in such dwellings.1

This section presents estimates of census net population undercoverage for 2006 Census tabulations on Aboriginal peoples. Estimates of coverage error for participating reserves are derived from the results of two studies, the Reverse Record Check (RRC) and the Census Overcoverage Study (COS), designed to measure census population undercoverage and census population overcoverage respectively. For incompletely enumerated Indian reserves and Indian settlements, model-based estimates are presented. Since no reliable source exists to verify the assumptions used in the models, these estimates must be used with caution. Estimates of 2006 Census population coverage error off reserve for persons with Aboriginal identity are not available.

5.4.1 Coverage error for participating reserves

The following table gives estimates of 2006 Census net undercoverage for all persons living on participating reserves including those without Aboriginal identify for Canada, for the eastern region (Newfoundland and Labrador, Prince Edward Island, Nova Scotia, New Brunswick, Quebec and Ontario) and for the western and northern region (Manitoba, Saskatchewan, Alberta, British Columbia, Yukon Territory and the Northwest Territories). Limitations of the coverage studies do not permit the production of estimates by Aboriginal identity. The rate of census net undercoverage indicates what proportion of the entire population that should have been enumerated is, on a net basis, not included in 2006 Census tabulations. Users are advised to consult the standard error of an estimate to determine its suitability for use.

The estimate of net undercoverage is the estimate of population undercoverage less the estimate of population overcoverage. One limitation of the estimate of overcoverage is that for a particular geography such as participating reserves, the estimate includes persons who appear on questionnaires for two dwellings where at least one of the dwellings is on reserve. The other dwelling may be on the same reserve, on a different reserve, or not on a reserve. Since the Census Overcoverage Study does not determine at which dwelling an individual should have been listed at, the assumption is made that it is equally likely that the individual should have been listed at the first dwelling as at the second dwelling. Therefore, in order to produce estimates of overcoverage, half of the weight for the person is assigned to each dwelling. This concept is important for small domains such as the 'on reserve' population. About half of the overcoverage cases involving a dwelling on reserve also involved a dwelling off reserve.

Data sources

The estimates of 2006 Census population coverage error are derived from 2006 Census data and the results of two studies. The Reverse Record Check (RRC) measures population undercoverage while the Census Overcoverage Study (COS) measures population overcoverage. In the RRC, a random sample of individuals representing the census target population is taken from frames independent of the 2006 Census such as a list of persons enumerated in the last census and a list of intercensal births according to provincial birth registries. Estimates are based on a sample of 3,579 persons in the provinces and a sample of 12,811 persons in the two territories. (Most of these had a weight of one.) The 2006 Census database is searched to determine if these people had indeed been enumerated. When required, a telephone interview was conducted to collect further information to declare the individual as in or not in scope for the census, and when in scope, to provide further data for searching.

Overcoverage is measured by matching the 2006 Census database to a partial list of persons who should have been enumerated, and by matching the 2006 Census database to itself. Persons with Aboriginal identity are in the 2B universe. Therefore, the 2B sampling weights are applied when estimating overcoverage. The COS applies automated exact matching and statistical matching. Statistical matching identifies matches that are close but not exact. Pairs of potential duplicates are sampled and the sampled person's name and demographic characteristics are used to identify the cases of duplication.

For more information on 2001 Census population coverage error, see: Coverage.

5.4.2 Coverage error for incompletely enumerated reserves and settlements

As discussed in section 5.3.2, some Indian reserves and settlements did not participate in the census as enumeration was not permitted, or it was interrupted before completion. In 2006, there were 22 incompletely enumerated reserves. Census data for these areas are not available, and therefore have not been included in any census tabulation.

These areas pose unique problems for the coverage studies and for the population estimates program. The survey population of the Reverse Record Check (RRC) does not include those residents where the census was unable to collect any data. However, Population Estimates Program requires an estimate of the permanent resident population living in these areas. Since neither the census nor the RRC is in a position to produce an estimate of the population living in these areas, a model-based methodology was used. The resulting estimates should be used with caution as they are based entirely on a model whose assumptions cannot be verified. The validity of these estimates depends on the extent to which the model assumptions capture the true underlying situation.

The following table gives the national model results.

In the 2001 Census, 30 reserves, with approximately 34,500 persons, were classified as 'incompletely enumerated.' Among the 22 reserves and settlements considered as incompletely enumerated in the 2006 Census, six were considered to have had complete enumerations in the 2001 Census while the other 16 were 'incompletely enumerated' or 'refusal'. The 2006 estimates are approximately 7.5% larger than the 2001 estimates.

Estimation model

A two step estimation model was developed to estimate the population. The first step uses a simple linear regression to predict the Census count in 2006. The linear regression was constructed using all Indian reserves that were completely enumerated in both the 2001 and the 2006 Census. The model assumes a linear growth from 2001 to 2006 for all provinces with separate estimates, for the intercept and the regression parameters for each province. The model was evaluated for the basic regression assumptions of independence of errors, homogeneity of variances and normality of errors.

For each incompletely enumerated reserve, the input variable for the regression model was either the actual census count in 2001 or the best predicted census count from the 2001 model. The output of the model was the estimated census count in 2006.

The second step is done to produce consistency with the results of the census coverage studies. An adjustment was made to the estimated 'census' count to account for net undercoverage of all subjected census counts. Net undercoverage for the incompletely enumerated reserves was estimated by calculating the net undercoverage rate for all completely enumerated reserves in each province and then applying that rate to the estimated 'census' count of all the incompletely enumerated Indian reserves in the province. The estimated 'census' count and the 'estimated net missed persons' in each reserve were then summed to create an 'estimated' population for the incompletely enumerated Indian reserves.

For provincial estimates please refer to: Incompletely enumerated Indian reserves and Indian settlements

Notes:

  1. The statistical imputation is performed through the census processing step called Whole Household Imputation which uses the results of the Dwelling Classification Survey. For more information see Chapter 7 of the Coverage, 2001 Census Technical Report, Catalogue no.: 92-394-XIE. Note that the 2006 Technical Report on Coverage will be available in March 2010.

previous gif   Previous page | Table of contents | Next page   next gif