Archived Content
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.
Chapter 2 – Confidentiality (non-disclosure) rules
Table of contents
The following describes the various rules used to ensure confidentiality (or non-disclosure) of individual respondent identity and characteristics. All NHS data are subject to confidentiality (non-disclosure) rules.
Area suppression for standard and non-standard geographic areas
Area suppression is used to remove all characteristic data for geographic areas below a specified population size.
The specified population size for all standardFootnote1 areas or aggregations of standard areas is 40, except for blocks, block-faces or postal codes. Consequently, no characteristics or tabulated data are to be released for areas below a population size of 40.
The specified population size for six-character postal codes (forward sortation area - local distribution unit [FSA-LDU]), geocoded areas and custom areas built from the block, block-face or LDU levels is 100. Consequently, no characteristics or tabulated data are to be released if the total population of the area is less than 100. Generally, blocks and individual urban block-faces (one side of the street between two intersections) will be too small to meet the above-specified population size thresholds. Where an aggregation of blocks or block-faces fall above the threshold specified by the population size, data can be retrieved through a custom tabulation.
Please refer to section Postal code minimum aggregation rules for additional rules applicable to postal code data.
Population universes used for suppression routines
The population under consideration for all data tabulations is the NHS estimate of population in private households.
For place of work data, the population under consideration is the employed labour force having a usual place of work or worked at home.
NHS | POW geographic areas |
---|---|
Estimate count of population in private households | Employed labour force having a usual place of work or worked at home |
For NHS tabulations that are based on place of work geographies or areas, all criteria are to be based on estimates of the employed labour force having a usual place of work or worked at home. That is, the 40 population, 100 population and 250 population thresholds are estimates of the employed labour force having a usual place of work or worked at home, rather than the population of the areas. Tabulations containing both places of residence and places of work as geographic areas have the 40, 100 and 250 size limits applied to both place of residence (population) and place of work (employed labour force having a usual place of work or worked at home).
Postal code minimum aggregation rules
In addition to the confidentiality rules on disseminating National Household Survey data with the postal codes, the following rules are applied to postal codes. These rules fall under clause 03.01 (n) of the Commercial Non-Mailing licence between Statistics Canada and Canada Post Corporation.
- All requests must include batches of two or more postal codes; the only exception being for postal codes which have a zero as the second digit (rural postal codes);
- Groups of postal codes are to be assigned a unique classification/number (e.g. K1A 0T6, 0T7, 0T8 = Custom Area 1); under the terms of the contract listed above, clients cannot be provided with lists of postal codes, only the name specified in the client's request can be used.
- All other confidentiality rules for custom extractions still apply as per section Area suppression for standard and non-standard geographic areas
Also, the following disclaimer is applicable to all postal code custom requests:
Postal code validation disclaimer: Statistics Canada makes no representation or warranty as to, or validation of the accuracy of any postal codeOM data submitted to Statistics Canada.
Please note these rules are applicable to historical postal code requests as well.
Random rounding
All estimates in NHS tabulations are subjected to a process called random rounding. Random rounding transforms all raw estimates to random rounded estimates. This reduces the possibility of identifying individuals within the tabulations.
All estimates greater than 10 are rounded to base 5, estimates less than 10 are rounded to base 10. This means that any estimates less than 10 will always be changed to 0 or 10. The table below shows the effect of rounding on estimates with a value less than 10.
Estimate of | Will round to 0 | Will round to 10 |
---|---|---|
1 | 9 times out of 10 | 1 time out of 10 |
2 | 8 times out of 10 | 2 times out of 10 |
3 | 7 times out of 10 | 3 times out of 10 |
4 | 6 times out of 10 | 4 times out of 10 |
5 | 5 times out of 10 | 5 times out of 10 |
6 | 4 times out of 10 | 6 times out of 10 |
7 | 3 times out of 10 | 7 times out of 10 |
8 | 2 times out of 10 | 8 times out of 10 |
9 | 1 time out of 10 | 9 times out of 10 |
0 | Always | Never |
The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same estimate in the same table being rounded up in one execution and rounded down in the next.
Disclosure avoidance for statistics
Statistics (such as mean, sum, median, percentile, ratio or percentage) are not subject to random rounding. However, when shown in tabulations accompanying the estimates used to calculate the statistic, their presence can result in disclosure of individuals. To prevent this, we use statistic suppression methods or special statistic calculations.
Statistic suppression
The following three situations will result in the suppression of statistics:
- Statistics for a cell will be suppressed if the range of data (i.e., the maximal dollar amount of the cell minus the minimal dollar amount) over the maximum of absolute values is below a threshold parameter. This method of suppression is applied only to statistics calculated from quantitative values measured in dollar ($) units such as income or value of dwellings.
- For all quantitative variables, a statistic is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 4. For quantile statistics, an alternate minimum number of records apply: for quartiles, quintiles and deciles, 20 records are required, and for percentiles, 400 records are required.
- Statistics for a cell will be suppressed if it contains an outlier. A cell is considered to contain an outlier if the largest absolute value divided by the sum of the absolute values is above a threshold parameter.
Note: The number of records used in the calculation is not necessarily the number of records in the cell but, rather, the number of records that are applicable or available to the calculation of the statistic in the cell.
Example:
Consider a cell containing the following records:
The eight records in the cell represent 47.6 persons (the sum of the weights). Since for the variable 'Wages' only non-zero values are used in the calculation, the average $22,727.27 will be suppressed because only three records are used in the calculation.
Record number | Weight | Wages ($) |
---|---|---|
1 | 5.5 | 16,500 |
2 | 2.9 | 345,600 |
3 | 8.1 | 12,900 |
4 | 6.2 | 0 |
5 | 6.6 | 0 |
6 | 5.9 | 0 |
7 | 5.4 | 0 |
8 | 6.9 | 0 |
- For all quantitative variables, all statistics are suppressed if the sum of the weights is less than 10.
Special statistic calculations
- The statistic value is never rounded, except for frequencies.
- All statistics based on ranks (medians, percentiles) are calculated the usual way, that is, never rounded.
- When a sum is specified, if the program sums a dollar value, a number of weeks, a number of hours, or an age, then the program multiplies the unrounded average of the group in question by the rounded, weighted frequency. Otherwise, the program rounds the actual weighted sum.
When a division is specified (averages, percentages, ratios, etc.), the program must apply the point (3) to both numerator and denominator before it proceeds with the division.
Note: Statistics based on ranks like median and percentiles are always calculated via linear interpolations. That means that, for cells with low estimates, these statistics are not reliable. That is the reason why no additional confidentiality measures are applied to them.
Note: The average of dollar value, a number of weeks, a number of hours or an age is not altered by the rounding because the numerator is the product of the true average by the rounded frequencies and the denominator is the rounded frequencies. The two frequencies cancel each other leaving the true average untouched.
Suppression of NHS estimates for confidentiality protection
Section Postal Code minimum aggregation rules discussed random rounding for estimates in NHS tabulations. Random rounding is used as a means of protecting confidentiality in estimates. Analysis of NHS data revealed that even with random rounding in place, in some cases, data with elevated risks of disclosure could be released.
These elevated risks arise because non-response adjustment in the NHS required a relatively wide range of weights. High weights may enable individuals with rare characteristics to be more easily identified in a table, particularly if their characteristics are publicly known.
To minimize these risks, a rule was instituted for estimates that are similar to the rule for quantitative variables described in Section Random rounding. A cell estimate will be suppressed if the number of records with the attribute or combination of attributes represented by the cell (unrounded and unweighted) is less than 4. In these cases, the cell will show the number 0 instead of the suppressed value, and thus will be indistinguishable from a genuinely empty cell.
Example:
Suppose we have the following records for a given geography:
Record number | Weight | Age |
---|---|---|
1 | 6.5 | 20 |
2 | 4.9 | 22 |
3 | 8 | 25 |
4 | 6.8 | 26 |
5 | 5.4 | 27 |
6 | 6.1 | 27 |
7 | 4.7 | 27 |
8 | 5.7 | 29 |
9 | 2.8 | 32 |
10 | 6.8 | 36 |
11 | 41.1 | 39 |
12 | 5 | 39 |
13 | 81.4 | 40 |
14 | 5.1 | 50 |
15 | 3.2 | 54 |
Applying only random rounding, the NHS estimates would be published (randomly rounded) as illustrated in the following table. The number of records would never be published, but is only for illustrating the effect of the rule.
This cell is intentionally left blank. | Age range | ||||
---|---|---|---|---|---|
Values | 20 to 29 | 30 to 39 | 40 to 49 | 50 to 59 | Total |
NHS estimate | 50 | 55 | 80 | 10 | 195 |
Number of records | 8 | 4 | 1 | 2 | 15 |
Suppression based on cell estimates suppresses both the 40 to 49 and 50 to 59 age ranges, as fewer than four (4) records have the attribute in question, and result in the following table. The total remains unchanged as the total cell represents at least four individuals.
This cell is intentionally left blank. | Age range | ||||
---|---|---|---|---|---|
Values | 20 to 29 | 30 to 39 | 40 to 49 | 50 to 59 | Total |
NHS estimate | 50 | 55 | 0 | 0 | 195 |
Number of records | 8 | 4 | 1 | 2 | 15 |
The primary intent of the rule is to prevent disclosure of personal information related to certain individuals. Within the example above, if there is only one individual in the area in question who is in the 40 to 49 age range, it is not divulged that they are an NHS respondent, and thereby it minimizes the risk that further information from the NHS on this individual is disclosed.
Footnotes
- Note 1
-
For more information on standard areas, refer to the 2011 Census Dictionary.
Report a problem on this page
Is something not working? Is there information outdated? Can't find what you're looking for?
Please contact us and let us know how we can help you.
- Date modified: