Statistics Canada
Symbol of the Government of Canada
Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

2.  Census data processing

2.1 Introduction

2.2 Receipt and registration

2.3 Imaging and data capture from paper questionnaires

2.4 Coverage edits

2.5 Completion edits and failed edit follow-up

2.6 Coding

2.7 Classification and non-response adjustments for unoccupied and non-response dwellings

2.8 Edit and imputation

2.9 Weighting

2.1 Introduction

This part of the census process involved the processing of all the completed questionnaires. This encompasses everything from the capture of questionnaire data through to the creation of an accurate and complete census database. Considered here are the steps of questionnaire registration, data capture, questionnaire imaging, editing, error correction, coding, imputation and weighting. In the remainder of this chapter, each data processing operation will be summarized.

Automated processes, implemented for the 2006 Census, had to be monitored to ensure that all Canadian residences were enumerated once and only once. The Master Control System was built to control and monitor the process flow. The Master Control System held a master listing of all the dwellings in Canada (each dwelling was identified with a unique identifier and about two-thirds of the dwellings also had an address). This system was updated on a daily basis with information of each dwelling's status in the census process flow (i.e., delivered, received, processed ). Reports were generated and accessible online to the census managers to ensure that operations were efficient and effective.

2.2  Receipt and registration

Respondents completing paper questionnaires mailed them back to a centralized data processing centre. Canada Post registered their receipt automatically by scanning the barcode on the front of the questionnaire through the transparent portion of the return envelope. The envelopes were then transported to the Data Processing Centre along with a compact disk containing the list of all of the identifiers for the registered questionnaires.

Responses received through the Internet or the Census Help Line telephone interview were received directly by the Data Processing Centre and their receipt registered automatically.

The registration of each returned questionnaire was flagged on the Master Control System at Statistics Canada. About 10 days after Census Day, a list of all of the dwellings for which a questionnaire had not been received was generated by the Master Control System and then transmitted to Field Operations for follow-up. Registration updates were sent to Field Operations on a daily basis to prevent follow-up on households which had subsequently completed their questionnaire, either by telephone or through the Internet.

2.3  Imaging and data capture from paper questionnaires

The 2006 Census was Canada's first census to capture data using automated capture technologies rather than manual keying. There were 5 steps in the imaging process:

  • Document preparation: mailed-back questionnaires were removed from envelopes and foreign objects, such as clips and staples, were detached in preparation for scanning. Forms that were in a booklet format were separated into single sheets by cutting off the spine.
  • Scanning: scanning, using 18 high-speed scanners, converted the paper to digital images (pictures).
  • Automated image quality assurance: an automated system verified the quality of the scanning. Images failing this process were flagged for rescanning or keying from paper.
  • Automated data capture: optical mark recognition and optical character recognition technologies were used to extract respondents' data from the images. Where the systems could not recognize the handwriting with sufficient accuracy, data repair was done by an operator.
  • Check-out: as soon as the questionnaires were processed successfully through all of the above steps, the paper questionnaires were checked out of the system. Check-out is a quality assurance process that ensures the images and captured data are of sufficient quality that the paper questionnaires are no longer required for subsequent processing. Questionnaires that had been flagged as containing errors were pulled at check-out and reprocessed.

2.4  Coverage edits

At this stage, a number of automated edits were performed on the respondent data. These edits were designed to detect cases where invalid persons may have been created either due to respondent error or data capture error. Examples include data erroneously entered in a blank person column, crossed off data that was captured in error, or data provided for the same person more than once, usually due to the receipt of duplicate forms (e.g., a husband completed the Internet version and his wife filled in the paper form and mailed it back). The edits were also designed to detect the possible absence of usual residents, when data are not provided for every household member listed at the beginning of the questionnaire.

Data from questionnaires that failed the edits were forwarded to processing clerks for verification. An interactive system enabled the clerks to examine the captured data and compare them with the image if available (online questionnaires would not have an image). Edit failures were resolved by manually deleting invalid or duplicate persons and adding missing ones (i.e., creating blank person records), as necessary and appropriate.

2.5  Completion edits and failed edit follow-up

Following the coverage edits, another set of automated edits was run to detect cases where there were either too many missing responses, or there were indications that data may not have been provided for all usual residents in the household. Households failing these edits were transmitted to the Census Help Line for follow-up. An interviewer telephoned the respondent to resolve any coverage issues and to fill in the missing information, using a computer-assisted telephone interviewing application. The data were then sent back to the Data Processing Center for reintegration into the system for subsequent processing.

2.6  Coding

The long-form questionnaires (2B, 2C, 2D and 3B) contained questions where answers could be checked off against a list, as well as questions requiring a written response from the respondent in the boxes provided. These written responses underwent automated coding to assign each one a numerical code, using Statistics Canada reference files, code sets and standard classifications. Reference files for the automated match process were built using actual responses from past censuses. Specially trained coders and subject-matter specialists resolved cases where a code could not be automatically assigned. The variables for which coding applied were: Relationship to Person 1, Place of birth, Citizenship, Non-official languages, Home language, Mother tongue, Ethnic origin, Population group, Indian band/First Nation, Place of residence 1 year ago, Place of residence 5 years ago, Major field of study, Location of study, Place of birth of parents, Language at work, Industry, Occupation and Place of work.

About 37 million write-ins were coded from the 2006 long-form questionnaires. An average of about 82% of these were coded automatically.

As the responses for a particular variable were coded, the data for that variable were sent to the edit and imputation phase.

2.7  Classification and non-response adjustments for unoccupied and non-response dwellings

The Dwelling Classification Survey (DCS) was used to estimate the error rates in classifying dwellings in the self-enumerated collection areas as occupied or unoccupied in the field. Based on this information, adjustments were made to the census database. The DCS selected a random sample of 1,405 self-enumerated CUs that were revisited in July and August 2006 to reassess the occupancy status as of census day for each dwelling for which no response had been received. The DCS found that 17.4% of the 934,564 dwellings classified as unoccupied were actually occupied and that 29.1% of the 366,527 dwellings with no responses that were classified as occupied or with occupancy status classified as unknown were actually unoccupied. Estimates based on the DCS sample were used to adjust the occupancy status for individual dwellings. This resulted in an increase of 3.6% in the number of occupied dwellings, and a decrease of 5.2% in the number of unoccupied dwellings at the Canada level.

After this adjustment of the occupancy status by the DCS, occupied dwellings with total non-response had the number of usual residents (if not known) and all the responses to the census questions imputed by borrowing the unimputed responses from another household within the same CU that had its type of questionnaire (long or short). This process, called whole household imputation (WHI), imputed 96% of the total non-response households. The other 4% of the total non-response households where no donor household was found under the WHI process were imputed as part of the main edit and imputation (E & I) process. Utilizing a single donor under WHI was more efficient computationally and was less likely to produce implausible results than using several donors as part of the main E & I process, as was done in 2001.

More details on the DCS and the whole household imputation procedure can be found in the 2006 Census Technical Report on Coverage, Catalogue no. 92-567-XWE.

2.8  Edit and imputation

The data collected in any survey or census contain some omissions or inconsistencies. For example, a respondent might be unwilling to answer a question, fail to remember the right answer, or misunderstand the question. Also, census staff may code responses incorrectly or make other mistakes during processing.

The final clean-up of data, done in the edit and imputation process, was for the most part fully automated. Two types of imputation were applied. The first type, called 'deterministic imputation,' involved assigning specific values under certain conditions. Detailed edit rules were applied to identify these conditions, and then the variables involved in the rules would be assigned a pre-determined value. The second type of imputation, called 'minimum-change donor imputation,' applied a series of detailed edit rules that identified any missing or inconsistent responses. These missing or inconsistent responses were corrected by changing as few variables as possible. For minimum-change donor imputation, a record with a number of characteristics in common with the record in error was selected. Data from this 'donor' record were borrowed and used to change the minimum number of variables necessary to resolve all missing or inconsistent responses. The CANadian Census Edit and Imputation System (CANCEIS) was the automated system used for nearly all deterministic and hot-deck donor imputation in 2006.

2.9  Weighting

Questions on age, sex, marital status, mother tongue and relationship to Person 1 were asked of 100% of the population, as in previous censuses. However, the bulk of census information was acquired on a 20% sample basis, using the additional questions on the 2B questionnaire. Weighting was used to project the information gathered from the 20% sample to the entire population.

For the 2006 Census, weighting employed the same methodology used in the 2001 Census, known as calibration estimation. This began by first assigning initial weights of approximately 5 to the sampled households. These weights were then adjusted by the smallest possible amount needed to ensure closer agreement between the sample estimates and the population counts for a number of characteristics related to age, sex, marital status, common-law status and household size (e.g., number of males, number of people aged 15 to 19). This method is described in detail in Chapter 4.

previous gif   Previous page | Table of contents | Next page   next gif