Chapter 3 – Confidentiality practices

Area suppression for income characteristic data

Area suppression is used to replace all income characteristic data with an 'x' for geographic areas with populations and/or number of households below a specific threshold.

If an NHS tabulation contains quantitative income data (e.g., total income, wages), qualitative data based on income concepts (e.g., low income before tax status) or derived data based on quantitative income variables (e.g., indexes) for individuals, families or households, then the following rule applies: income characteristic data are replaced with an 'x' for areas where the population is less than 250 or where the number of private households is less than 40. The private household threshold does not apply for tabulations based on place of work geographies.

Confidentiality adjustment for place of work estimates

The place of work estimates for dissemination blocks are available on a custom basis. These estimates will be adjusted to reinforce the confidential nature of the data. In fact, all dissemination block estimates for employed labour force having a usual place of work or worked at home will be rounded to a base of 5. This adjustment, however, will be controlled. That is, aggregates (totals) of the adjusted population estimates for dissemination areas will always be within 5 of the actual values.

Confidentiality adjustment for daytime population estimates

Daytime population estimates will be determined by taking the population living in a specific area, adding in the workers who live elsewhere and commute into the area, and subtracting the workers who live in the area and commute out of the area. The number of workers will be based on persons in the employed labour force having a usual place of work or worked at home. Daytime population estimates will be adjusted to reinforce the confidential nature of the data by controlled rounding of the estimates to a base of 5.

Preventing disclosure

Prevention of direct or residual disclosure must also be addressed when determining product content. When assessing the potential for disclosure, a number of factors must be considered. The detail of individual variables, cross-classification of variables and the geographic level of the data will all contribute to the risk. For example, there may be no risk in producing households by number of rooms in the dwelling and detailed groupings of dwelling value showing various characteristics of the household members for large geographic areas. However, the risk of disclosure would increase for the lower levels of geography.

The most common method used for preventing disclosure is defining content that is appropriate for a given geographic level. Increasing population thresholds or applying manual suppression as needed are other methods that can be employed. Since these are typically product-specific requirements, they are not part of the automated suppression systems.

Census of Agriculture tabulations

Census of Agriculture and National Household Survey (NHS) data are matched using geographic information and the characteristics of farm operators (i.e., age and sex). Match rates are about 95% and weighting is performed to account for non-matches. Data are available for all members of households where a farm operator resides.

Census of Agriculture data include farm type, farm sales, area of crops and numbers of livestock while the NHS provides socioeconomic data, including education, income and occupation of families and household members. Pre-planned standard products are produced at the province level only.

Custom products are available at provincial levels. The data are random-rounded and low-bounded to ensure confidentiality. Suppressions are done manually if cells are below a specified size.

All verification of tables is done internally by Census of Agriculture staff.

Public Use Microdata Files (PUMFs)

The 2011 NHS PUMF products will consist of two microdata files: the individual file and the hierarchical file. The individual file will contain records from approximately 3% of the Canadian population and the hierarchical file will contain records from 1% of the population in private households.

Microdata files are unique among NHS products in that they give users access to non-aggregated data. This makes PUMF a powerful research tool. The files contain a large number of variables. Users can group and manipulate these variables to suit their own data and research requirements. Tabulations not included in other NHS products can be created, or relationships between variables can be analyzed using various analytical tools.

The NHS Public Use Microdata Files (PUMFs) provide quick access to a comprehensive social and economic database about Canada and its people. They consist of samples of anonymous responses to the NHS questionnaire (Forms N1, N2). The PUMFs contain statistical information about Canadians, the families and households to which they belong and the dwellings in which they live and allow researchers to study the relationships between these universes.

Statistics Canada has to protect the confidential information that it collects. Owing to the very nature of a microdata file, various measures are taken to fulfil this commitment. The Microdata Release Committee reviews all requests for release of microdata.

Data for small geographic areas will not be available in these files. The user will find information only for selected census metropolitan areas, the provinces and the territories. Variable data is aggregated to preserve confidentiality while providing as much detail as possible to maintain the analytical value of the file. Some of the values of sensitive variables are suppressed because their combination could be used to identify a person, a family or a household. Also, all quantitative dollar value variables are subjected to random rounding and are top and bottom coded.

