Data quality and confidentiality standards and guidelines (public): Introduction


Data disseminated by the census are subjected to a variety of automated and manual processes to determine whether the data needs to be suppressed. This is done primarily for two reasons: (1) to ensure non-disclosure of individual respondent identity and characteristics (which will subsequently be referred to as 'confidentiality') and (2) to limit the dissemination of data of unacceptable quality (which will subsequently be referred to as 'data quality').

Additionally, suppression of data may be applied for product specific reasons due, typically, to formatting issues. The term 'product' refers, primarily, to tabular output. Data may be either modified in the product or removed from the product altogether to reflect the suppression rules required. This document summarizes the data quality and confidentiality standards and guidelines to be applied for the 2006 Census Dissemination Project.

Executive summary

The data quality and confidentiality standards were developed for application in the 2006 Census Dissemination Project. The summary below includes revisions made to the rules and practices that were carried over from 2001, along with new rules and practices adopted for 2006.

Revisions to 2001 rules and guidelines

New for 2006 are the following

Census release criteria

The following questions were given consideration when reviewing and assessing the current release criteria and established policy for census data tabulations:

  • Has the release criteria been relaxed over the last three or four census cycles? And, if so, why?
  • What additional data outside the agency, if any, might give cause for concern with respect to current population release thresholds?
  • What new data mining tools or software being used today might impact on census data release thresholds?
  • Are our release criteria for tabular data more or less stringent than other statistical agencies?
  • Does the census random rounding policy still work as an effective disclosure avoidance technique, and should we be looking at others?

The current release criteria and established policy for census weighted tables have proven to be successful at ensuring non-disclosure. Evolving client demands, data mining, the ever-increasing data combination possibilities and software developments give reason to perform regular assessments, research new methods, and make changes where necessary.

The census release criteria have been modified slightly over time since the 1981 Census, adapted to meet changing needs and conditions. In general, although some population thresholds for geographic area suppression were adjusted downwards from 1981 to 1986 and remained constant since then, there was no evidence of any negative impact resulting from this change. In other ways, the criteria have actually been strengthened, such as the additional protection added to the random rounding rules, the addition of the higher suppression threshold for postal code information, and suppression of certain statistics (e.g., standard deviation = 0). The population universes upon which the population thresholds are based were also modified in 1991 and 1996 to add protection against disclosure. In addition, a new suppression method has been implemented in 2001 that will prevent the dissemination of tables in which the number of units (individuals, families or households) is below a given threshold. The suppression rules and random rounding practices specific to the dissemination of 2006 data are provided in Section 2 of this document. A summary table of historical confidentiality rules, 1981 to 2001 can be found in Appendix A.

