Overview of the Census of Population
Chapter 8 – Data quality assessment

Introduction

Data quality assessment provides an evaluation of the overall quality of census data. The results are used to inform users of the reliability of the data, to make improvements for the next census and, in the case of two coverage studies, to adjust the official population estimates. Quality assessment activities take place throughout the census process, beginning prior to data collection and ending after dissemination.

Sources of error

However well a census is designed, the data it produces will inevitably contain errors. Errors can occur at virtually every stage of the census process, including the preparation of materials, the listing of dwellings and data collection, and processing. Users of census data should be aware of the types of errors that can occur, so they are able to assess the usefulness of census data for their own purposes.

The principal types of error are:

Coverage errors occur when dwellings and/or individuals are missed, incorrectly included or counted more than once.

Non-response errors occur when some or all information about particular individuals, households or dwellings is not provided.

Response errors occur when a question is misunderstood or a characteristic is misreported by the respondent, or by the census enumerator or Census Help Line operator.

Processing errors may occur at any stage of processing. Processing errors include keying errors that can be made at data capture during coding operations, when written responses are transformed into numerical codes; and during imputation, when valid (but not necessarily correct) values are inserted into a record to replace missing or invalid data.

Sampling errors apply only when answers to questions are obtained from a sample. This type of error does not apply to the 2011 Census.

Data quality measurement

To allow data users to assess the impact of errors and to improve our own understanding of how and where errors occur, a number of data quality studies have been conducted for recent censuses. For the 2011 Census, special studies examine errors in coverage and data quality, i.e., non-response, response and processing.

Three studies are undertaken to measure coverage errors:

  1. Dwelling Classification Survey – One of the sources of coverage error in the census is the misclassification of dwellings on Census Day. This can occur when a dwelling that is truly occupied is classified as unoccupied, or when an unoccupied dwelling is classified as occupied. This misclassification potentially affects any dwelling for which a census questionnaire is not returned (a non-response dwelling). The purpose of the Dwelling Classification Survey is to study these types of classification error. A sample of dwellings for which no census questionnaire was returned is contacted, and information is collected on the occupancy status and, if occupied, the number of persons living in each dwelling.

    This information is used to adjust the census data for households and persons to correct for these misclassifications and to adjust the household size distribution through imputation for the non-response dwellings. This is done in time for the initial population count release.

  2. Reverse Record Check – This study provides estimates of persons missed by the census (after accounting for the adjustments described in the Dwelling Classification Survey, above). Estimates are developed for each province and territory and for various subgroups of the population (e.g., age-sex groups, marital status).

    For the provinces, there are two steps in this study:

    • Step 1: Construction of a sample of persons who should be enumerated in the census, using sources which include the previous census, birth registrations, immigration and non-permanent residents' records and the previous Reverse Record Check (to represent those missed in the last census).

    • Step 2: Checking the census response database to determine if these persons have been enumerated. Some persons have to be traced and interviewed to collect additional information. Persons who have died or emigrated prior to Census Day are identified during the tracing or the interviews.

    For the territories, there are a couple of differences in Step 1. For the purpose of sampling, information that identifies respondents (such as name, date of birth and sex) from health care records is matched to census records to identify people who have been enumerated in the census. The sample for the Reverse Record Check is then selected among the unmatched persons.

    The results of this study are the major source of information about persons missed by the census. However, unlike the Dwelling Classification Survey, the estimates are not used to adjust census counts before the initial population count release.

  3. Census Overcoverage Study – For the 2011 and 2006 censuses, double-counting of persons was detected by searching the census database for pairs of records that had high-quality matches on sex, date of birth and name. Both deterministic (exact) and probabilistic matching techniques were used. Potential pairs of overcoverage were sampled and checked manually and results were used to estimate the census overcoverage.

    When combined with the results from the Reverse Record Check, the results of the Census Overcoverage Study provide the estimates of the net coverage errors in census data. These are used to derive the official population counts.

Certification

Certification consists of several activities which rigorously assess the quality of the census data at specific levels of geography to ensure quality standards for public release are met. This evaluation includes the certification of population and dwelling counts and variables related to dwellings and population characteristics.

Components of the data certification are:

  • a review and analysis of data quality indicators for internal and external consistency
  • the provision of a quality statement
  • recommendations and approval of the data for release
  • the format of the final released data including advisories to the users in terms of special notes, caveats or other data quality indicators.

During certification, response rates, invalid responses, edit failure rates, and a comparison of estimates before and after imputation are among the data quality measures used. Tabulations from the 2011 Census are produced and compared with corresponding data from past censuses, from other surveys, and from administrative sources. Detailed cross-tabulations are also checked for consistency and accuracy.

Depending on the results of certification, census data may be released in one of three ways.

  • First, the data may be released unconditionally, reflecting data of suitable quality.

  • Second, the data may be released conditionally or in a constrained manner. In this case, the data will be released with a special note or waiver alerting the user to possible limitations or the data may also be specially treated, for example by combining reporting categories to address quality or confidentiality concerns.

  • Finally, the data may be suppressed for quality reasons.
Date modified: