Chapter 2 – Confidentiality (non-disclosure) rules

The following describes the various rules used to ensure confidentiality (or non-disclosure) of individual respondent identity and characteristics. All NHS data are subject to confidentiality (non-disclosure) rules.

Area suppression for standard and non-standard geographic areas

Area suppression is used to remove all characteristic data for geographic areas below a specified population size.

The specified population size for all standardFootnote1 areas or aggregations of standard areas is 40, except for blocks, block-faces or postal codes. Consequently, no characteristics or tabulated data are to be released for areas below a population size of 40.

The specified population size for six-character postal codes (forward sortation area - local distribution unit [FSA-LDU]), geocoded areas and custom areas built from the block, block-face or LDU levels is 100. Consequently, no characteristics or tabulated data are to be released if the total population of the area is less than 100. Generally, blocks and individual urban block-faces (one side of the street between two intersections) will be too small to meet the above-specified population size thresholds. Where an aggregation of blocks or block-faces fall above the threshold specified by the population size, data can be retrieved through a custom tabulation.

Please refer to section Postal code minimum aggregation rules for additional rules applicable to postal code data.

Population universes used for suppression routines

The population under consideration for all data tabulations is the NHS estimate of population in private households.

For place of work data, the population under consideration is the employed labour force having a usual place of work or worked at home.

Population universes used for suppression routines
NHS POW geographic areas
Estimate count of population in private households Employed labour force having a usual place of work or worked at home

For NHS tabulations that are based on place of work geographies or areas, all criteria are to be based on estimates of the employed labour force having a usual place of work or worked at home. That is, the 40 population, 100 population and 250 population thresholds are estimates of the employed labour force having a usual place of work or worked at home, rather than the population of the areas. Tabulations containing both places of residence and places of work as geographic areas have the 40, 100 and 250 size limits applied to both place of residence (population) and place of work (employed labour force having a usual place of work or worked at home).

Postal code minimum aggregation rules

In addition to the confidentiality rules on disseminating National Household Survey data with the postal codes, the following rules are applied to postal codes. These rules fall under clause 03.01 (n) of the Commercial Non-Mailing licence between Statistics Canada and Canada Post Corporation.

  • All requests must include batches of two or more postal codes; the only exception being for postal codes which have a zero as the second digit (rural postal codes);
  • Groups of postal codes are to be assigned a unique classification/number (e.g. K1A 0T6, 0T7, 0T8 = Custom Area 1); under the terms of the contract listed above, clients cannot be provided with lists of postal codes, only the name specified in the client's request can be used.
  • All other confidentiality rules for custom extractions still apply as per section Area suppression for standard and non-standard geographic areas

Also, the following disclaimer is applicable to all postal code custom requests:

Postal code validation disclaimer: Statistics Canada makes no representation or warranty as to, or validation of the accuracy of any postal codeOM data submitted to Statistics Canada.

Please note these rules are applicable to historical postal code requests as well.

Random rounding

All estimates in NHS tabulations are subjected to a process called random rounding. Random rounding transforms all raw estimates to random rounded estimates. This reduces the possibility of identifying individuals within the tabulations.

All estimates greater than 10 are rounded to base 5, estimates less than 10 are rounded to base 10. This means that any estimates less than 10 will always be changed to 0 or 10. The table below shows the effect of rounding on estimates with a value less than 10.

Random rounding frequency
Estimate of Will round to 0 Will round to 10
1 9 times out of 10 1 time out of 10
2 8 times out of 10 2 times out of 10
3 7 times out of 10 3 times out of 10
4 6 times out of 10 4 times out of 10
5 5 times out of 10 5 times out of 10
6 4 times out of 10 6 times out of 10
7 3 times out of 10 7 times out of 10
8 2 times out of 10 8 times out of 10
9 1 time out of 10 9 times out of 10
0 Always Never

The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same estimate in the same table being rounded up in one execution and rounded down in the next.

Disclosure avoidance for statistics

Statistics (such as mean, sum, median, percentile, ratio or percentage) are not subject to random rounding. However, when shown in tabulations accompanying the estimates used to calculate the statistic, their presence can result in disclosure of individuals. To prevent this, we use statistic suppression methods or special statistic calculations.

Statistic suppression

The following three situations will result in the suppression of statistics:

  1. Statistics for a cell will be suppressed if the range of data (i.e., the maximal dollar amount of the cell minus the minimal dollar amount) over the maximum of absolute values is below a threshold parameter. This method of suppression is applied only to statistics calculated from quantitative values measured in dollar ($) units such as income or value of dwellings.
  2. For all quantitative variables, a statistic is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 4. For quantile statistics, an alternate minimum number of records apply: for quartiles, quintiles and deciles, 20 records are required, and for percentiles, 400 records are required.
  3. Statistics for a cell will be suppressed if it contains an outlier. A cell is considered to contain an outlier if the largest absolute value divided by the sum of the absolute values is above a threshold parameter.

Note: The number of records used in the calculation is not necessarily the number of records in the cell but, rather, the number of records that are applicable or available to the calculation of the statistic in the cell.

Example:

Consider a cell containing the following records:

The eight records in the cell represent 47.6 persons (the sum of the weights). Since for the variable 'Wages' only non-zero values are used in the calculation, the average $22,727.27 will be suppressed because only three records are used in the calculation.

Example of eight records showing the weight applied and wages of each respondent
Record number Weight Wages ($)
1 5.5 16,500
2 2.9 345,600
3 8.1 12,900
4 6.2 0
5 6.6 0
6 5.9 0
7 5.4 0
8 6.9 0
  1. For all quantitative variables, all statistics are suppressed if the sum of the weights is less than 10.

Special statistic calculations

  1. The statistic value is never rounded, except for frequencies.
  2. All statistics based on ranks (medians, percentiles) are calculated the usual way, that is, never rounded.
  3. When a sum is specified, if the program sums a dollar value, a number of weeks, a number of hours, or an age, then the program multiplies the unrounded average of the group in question by the rounded, weighted frequency. Otherwise, the program rounds the actual weighted sum.

When a division is specified (averages, percentages, ratios, etc.), the program must apply the point (3) to both numerator and denominator before it proceeds with the division.

Note: Statistics based on ranks like median and percentiles are always calculated via linear interpolations. That means that, for cells with low estimates, these statistics are not reliable. That is the reason why no additional confidentiality measures are applied to them.

Note: The average of dollar value, a number of weeks, a number of hours or an age is not altered by the rounding because the numerator is the product of the true average by the rounded frequencies and the denominator is the rounded frequencies. The two frequencies cancel each other leaving the true average untouched.

Suppression of NHS estimates for confidentiality protection

Section Postal Code minimum aggregation rules discussed random rounding for estimates in NHS tabulations. Random rounding is used as a means of protecting confidentiality in estimates. Analysis of NHS data revealed that even with random rounding in place, in some cases, data with elevated risks of disclosure could be released.

These elevated risks arise because non-response adjustment in the NHS required a relatively wide range of weights. High weights may enable individuals with rare characteristics to be more easily identified in a table, particularly if their characteristics are publicly known.

To minimize these risks, a rule was instituted for estimates that are similar to the rule for quantitative variables described in Section Random rounding. A cell estimate will be suppressed if the number of records with the attribute or combination of attributes represented by the cell (unrounded and unweighted) is less than 4. In these cases, the cell will show the number 0 instead of the suppressed value, and thus will be indistinguishable from a genuinely empty cell.

Example:

Suppose we have the following records for a given geography:

Example of 15 records showing the weight applied and the age of each respondent
Record number Weight Age
1 6.5 20
2 4.9 22
3 8 25
4 6.8 26
5 5.4 27
6 6.1 27
7 4.7 27
8 5.7 29
9 2.8 32
10 6.8 36
11 41.1 39
12 5 39
13 81.4 40
14 5.1 50
15 3.2 54

Applying only random rounding, the NHS estimates would be published (randomly rounded) as illustrated in the following table. The number of records would never be published, but is only for illustrating the effect of the rule.

Example of estimates that would be published without applying cell suppression
Table Summary
This table shows estimates that would be published based on the example of 15 records in the previous table. The column headings are: age ranges of 20 to 29; 30 to 39; 40 to 49; 50 to 59; total. The rows are: values, NHS estimate; number of records.
This cell is intentionally left blank. Age range
Values 20 to 29 30 to 39 40 to 49 50 to 59 Total
NHS estimate 50 55 80 10 195
Number of records 8 4 1 2 15

Suppression based on cell estimates suppresses both the 40 to 49 and 50 to 59 age ranges, as fewer than four (4) records have the attribute in question, and result in the following table. The total remains unchanged as the total cell represents at least four individuals.

Example of estimates that would be published with cell suppression applied
Table Summary
Using the same example of 15 records, this table shows the estimates that would be published as a result of cell count suppression. The column headings are: age ranges of 20 to 29; 30 to 39; 40 to 49; 50 to 59; total. The rows are: values, NHS estimate; number of records.
This cell is intentionally left blank. Age range
Values 20 to 29 30 to 39 40 to 49 50 to 59 Total
NHS estimate 50 55 0 0 195
Number of records 8 4 1 2 15

The primary intent of the rule is to prevent disclosure of personal information related to certain individuals. Within the example above, if there is only one individual in the area in question who is in the 40 to 49 age range, it is not divulged that they are an NHS respondent, and thereby it minimizes the risk that further information from the NHS on this individual is disclosed.

Footnotes

Note 1

For more information on standard areas, refer to the 2011 Census Dictionary.

Return to footnote 1 referrer

Date modified: