Guide to the Census of Population, 2016
Chapter 10 – Data quality assessment
Data quality assessment provides an evaluation of the overall quality of census data. The results of this assessment are used to inform users of the reliability of the data, to make improvements for the next census, to adjust census data for non-response and, for two coverage studies (reverse record check and the Census Overcoverage Study), to produce official population estimates. Quality assessment activities take place throughout the census process, beginning prior to data collection and ending after dissemination.
Sources of error
However well a census is designed, the data collected will inevitably contain errors. Errors can occur at virtually every stage of the census process, from material preparation to creation of the list of dwellings, data collection and processing. Census data users should be aware of the types of errors that can occur, so they can assess the usefulness of the data for their own purposes.
Main types of errors:
Coverage errors occur when dwellings and/or persons are missed, incorrectly enumerated or counted more than once.
Non-response errors occur when some or all information about individuals, households or dwellings is not provided.
Response errors occur when a question is misunderstood or a characteristic is misreported by the respondent, the census enumerator or the Census Help Line operator.
Processing errors can occur at any stage of processing. Processing errors include errors that can be made at data capture during coding operations, when written responses are converted into numerical codes, and during imputation, when valid (but not necessarily accurate) values are inserted into a record to replace missing or invalid data.
Sampling errors apply only when answers to questions are obtained from a sample. This type of error applies only to the 2016 Census long-form questionnaire.
Measuring data quality
Many data quality studies have been conducted for recent censuses to allow data users to assess the impact of errors and improve their own understanding of how errors occur. For the 2016 Census, special studies examine errors in coverage and data quality, i.e., non-response, response and processing.
Three studies are conducted to measure coverage errors:
- Dwelling Classification Survey – One of the sources of coverage error in the census is the misclassification of dwellings on Census Day. This error can occur when an occupied dwelling is classified as unoccupied, or when an unoccupied dwelling is classified as occupied. The purpose of the Dwelling Classification Survey is to study these types of classification errors and adjust counts, if necessary. A sample of dwellings for which no census questionnaire was returned is contacted, information is collected on the occupancy status and, if occupied, on the number of usual residents.
- This information is used to adjust the census data for dwellings, households and persons. This is done by correcting the classification errors and adjusting household size distribution through imputation for dwellings that did not return the questionnaire. It is done in time for the initial population count release.
- Reverse Record Check – This study provides estimates of persons missed by the census (after accounting for the adjustments described in the Dwelling Classification Survey above). Estimates are developed for each province and territory and for various population subgroups (e.g., age-sex groups and marital status).
- For the provinces, this study comprises two steps:
- Step 1: Selecting a sample of persons who should have been enumerated in the census, using sources such as the previous census, birth registrations, immigration and non-permanent residents' records, and the sample of persons missed in the Reverse Record Check from the previous census.
- Step 2: Linking persons selected in Step 1 to the Census Response Database (CRD) to determine whether these persons were enumerated. The survey is then used to trace and interview persons who could not be linked with certainty to the CRD in order to collect additional information. Persons who have died or who emigrated prior to Census Day are identified using administrative records, such as the death register, or during tracing or the interviews.
- For the territories, Step 1 consists in linking the persons on health insurance records to the Census Response Database to identify persons who were enumerated in the census. The Reverse Record Check sample is then selected among the unmatched persons.
- The results of the Reverse Record Check are the most important source of information about persons missed in the census. However, unlike the Dwelling Classification Survey, the estimates are not used to adjust census data before the initial population count release.
- Census Overcoverage Study – In the 2011 and 2016 censuses, double-counting of persons is determined by searching for linked records that have a high degree of matching on sex, date of birth and name. Linked records are sampled and checked manually, and results are used to estimate the census overcoverage (or the number of duplicate persons).
- When combined with the results of the Reverse Record Check, the results of the Census Overcoverage Study provide estimates of net coverage error in census data. This net error is used to calculate the official population estimates.
Certification consists of several activities to rigorously assess the quality of census data at specific levels of geography in order to ensure that the quality standards for public release are met. This evaluation includes the certification of population and dwelling counts, and variables related to dwelling and population characteristics.
During certification, response rates, invalid responses, edit failure rates, and a comparison of data before and after imputation are among the data quality measures used. Tabulations for the 2016 Census are produced and compared with corresponding data from past censuses, other surveys and administrative sources. Detailed cross-tabulations are also checked for consistency and accuracy.
Depending on the certification results, census data can be released in one of three ways:
- First, the data may be released unconditionally, meaning that the data are of suitable quality.
- Second, the data may be released conditionally or with restrictions. In this case, the data will be released with a special note alerting users to possible limitations, or the data may be specially processed, for example, by combining reporting categories to address quality or confidentiality concerns.
- Finally, the data may be suppressed for quality reasons.
For more information on the quality indicators and certification results, see the reference guides for the various domains of interest.
Response rate for the 2016 Census of Population
One of the key data quality measures used for the Census of Population is the response rate. Table 10.1 shows the response rates for the 2016 Census of Population both nationally and for each province and territory. The rates are provided for all occupied private dwellings for which a short form or long form was to be received and for the subset of occupied private dwellings for which a long form was to be received. For the long form, the unweighted response rate and the weighted response rate are provided.
The rates in Table 10.1 were calculated following data processing and data quality assessment. Response rates are calculated as follows: the number of private dwellings for which a questionnaire was filled out divided by the number of private dwellings classified as occupied according to the census database. The final classification of dwelling occupancy status is based on the data analysis collected by field staff, the data provided by respondents and the results of a quality study on the occupancy status of a sample of dwellings. The rates in Table 10.1 differ from the collection response rates previously disseminated because they take into account data processing and verification of the dwelling occupancy status and thus are considered final. With respect to weighted response rates, they are based on the long form's final sampling weights. The weighted response rates are therefore calculated as follows: the number of sampled weighted private dwellings for which a questionnaire was filled out divided by the number of weighted sampled private dwellings classified as occupied.
|Province/territory||Short and long form response rates||Unweighted response rates from the long form only||Weighted response rates from the long form only|
|Newfoundland and Labrador||97.4||96.6||96.8|
|Prince Edward Island||97.5||96.9||97.0|
- Date modified: