Chapter 2 – Confidentiality (nondisclosure) rules
Table of contents
The following describes the various rules used to ensure confidentiality (or nondisclosure) of individual respondent identity and characteristics. All NHS data are subject to confidentiality (nondisclosure) rules.
Area suppression for standard and nonstandard geographic areas
Area suppression is used to remove all characteristic data for geographic areas below a specified population size.
The specified population size for all standard^{Footnote1} areas or aggregations of standard areas is 40, except for blocks, blockfaces or postal codes. Consequently, no characteristics or tabulated data are to be released for areas below a population size of 40.
The specified population size for sixcharacter postal codes (forward sortation area  local distribution unit [FSALDU]), geocoded areas and custom areas built from the block, blockface or LDU levels is 100. Consequently, no characteristics or tabulated data are to be released if the total population of the area is less than 100. Generally, blocks and individual urban blockfaces (one side of the street between two intersections) will be too small to meet the abovespecified population size thresholds. Where an aggregation of blocks or blockfaces fall above the threshold specified by the population size, data can be retrieved through a custom tabulation.
Please refer to section Postal code minimum aggregation rules for additional rules applicable to postal code data.
Population universes used for suppression routines
The population under consideration for all data tabulations is the NHS estimate of population in private households.
For place of work data, the population under consideration is the employed labour force having a usual place of work or worked at home.
NHS  POW geographic areas 

Estimate count of population in private households  Employed labour force having a usual place of work or worked at home 
For NHS tabulations that are based on place of work geographies or areas, all criteria are to be based on estimates of the employed labour force having a usual place of work or worked at home. That is, the 40 population, 100 population and 250 population thresholds are estimates of the employed labour force having a usual place of work or worked at home, rather than the population of the areas. Tabulations containing both places of residence and places of work as geographic areas have the 40, 100 and 250 size limits applied to both place of residence (population) and place of work (employed labour force having a usual place of work or worked at home).
Postal code minimum aggregation rules
In addition to the confidentiality rules on disseminating National Household Survey data with the postal codes, the following rules are applied to postal codes. These rules fall under clause 03.01 (n) of the Commercial NonMailing licence between Statistics Canada and Canada Post Corporation.
 All requests must include batches of two or more postal codes; the only exception being for postal codes which have a zero as the second digit (rural postal codes);
 Groups of postal codes are to be assigned a unique classification/number (e.g. K1A 0T6, 0T7, 0T8 = Custom Area 1); under the terms of the contract listed above, clients cannot be provided with lists of postal codes, only the name specified in the client's request can be used.
 All other confidentiality rules for custom extractions still apply as per section Area suppression for standard and nonstandard geographic areas
Also, the following disclaimer is applicable to all postal code custom requests:
Postal code validation disclaimer: Statistics Canada makes no representation or warranty as to, or validation of the accuracy of any postal code^{OM} data submitted to Statistics Canada.
Please note these rules are applicable to historical postal code requests as well.
Random rounding
All estimates in NHS tabulations are subjected to a process called random rounding. Random rounding transforms all raw estimates to random rounded estimates. This reduces the possibility of identifying individuals within the tabulations.
All estimates greater than 10 are rounded to base 5, estimates less than 10 are rounded to base 10. This means that any estimates less than 10 will always be changed to 0 or 10. The table below shows the effect of rounding on estimates with a value less than 10.
Estimate of  Will round to 0  Will round to 10 

1  9 times out of 10  1 time out of 10 
2  8 times out of 10  2 times out of 10 
3  7 times out of 10  3 times out of 10 
4  6 times out of 10  4 times out of 10 
5  5 times out of 10  5 times out of 10 
6  4 times out of 10  6 times out of 10 
7  3 times out of 10  7 times out of 10 
8  2 times out of 10  8 times out of 10 
9  1 time out of 10  9 times out of 10 
0  Always  Never 
The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same estimate in the same table being rounded up in one execution and rounded down in the next.
Disclosure avoidance for statistics
Statistics (such as mean, sum, median, percentile, ratio or percentage) are not subject to random rounding. However, when shown in tabulations accompanying the estimates used to calculate the statistic, their presence can result in disclosure of individuals. To prevent this, we use statistic suppression methods or special statistic calculations.
Statistic suppression
The following three situations will result in the suppression of statistics:
 Statistics for a cell will be suppressed if the range of data (i.e., the maximal dollar amount of the cell minus the minimal dollar amount) over the maximum of absolute values is below a threshold parameter. This method of suppression is applied only to statistics calculated from quantitative values measured in dollar ($) units such as income or value of dwellings.
 For all quantitative variables, a statistic is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 4. For quantile statistics, an alternate minimum number of records apply: for quartiles, quintiles and deciles, 20 records are required, and for percentiles, 400 records are required.
 Statistics for a cell will be suppressed if it contains an outlier. A cell is considered to contain an outlier if the largest absolute value divided by the sum of the absolute values is above a threshold parameter.
Note: The number of records used in the calculation is not necessarily the number of records in the cell but, rather, the number of records that are applicable or available to the calculation of the statistic in the cell.
Example:
Consider a cell containing the following records:
The eight records in the cell represent 47.6 persons (the sum of the weights). Since for the variable 'Wages' only nonzero values are used in the calculation, the average $22,727.27 will be suppressed because only three records are used in the calculation.
Record number  Weight  Wages ($) 

1  5.5  16,500 
2  2.9  345,600 
3  8.1  12,900 
4  6.2  0 
5  6.6  0 
6  5.9  0 
7  5.4  0 
8  6.9  0 
 For all quantitative variables, all statistics are suppressed if the sum of the weights is less than 10.
Special statistic calculations
 The statistic value is never rounded, except for frequencies.
 All statistics based on ranks (medians, percentiles) are calculated the usual way, that is, never rounded.
 When a sum is specified, if the program sums a dollar value, a number of weeks, a number of hours, or an age, then the program multiplies the unrounded average of the group in question by the rounded, weighted frequency. Otherwise, the program rounds the actual weighted sum.
When a division is specified (averages, percentages, ratios, etc.), the program must apply the point (3) to both numerator and denominator before it proceeds with the division.
Note: Statistics based on ranks like median and percentiles are always calculated via linear interpolations. That means that, for cells with low estimates, these statistics are not reliable. That is the reason why no additional confidentiality measures are applied to them.
Note: The average of dollar value, a number of weeks, a number of hours or an age is not altered by the rounding because the numerator is the product of the true average by the rounded frequencies and the denominator is the rounded frequencies. The two frequencies cancel each other leaving the true average untouched.
Suppression of NHS estimates for confidentiality protection
Section Postal Code minimum aggregation rules discussed random rounding for estimates in NHS tabulations. Random rounding is used as a means of protecting confidentiality in estimates. Analysis of NHS data revealed that even with random rounding in place, in some cases, data with elevated risks of disclosure could be released.
These elevated risks arise because nonresponse adjustment in the NHS required a relatively wide range of weights. High weights may enable individuals with rare characteristics to be more easily identified in a table, particularly if their characteristics are publicly known.
To minimize these risks, a rule was instituted for estimates that are similar to the rule for quantitative variables described in Section Random rounding. A cell estimate will be suppressed if the number of records with the attribute or combination of attributes represented by the cell (unrounded and unweighted) is less than 4. In these cases, the cell will show the number 0 instead of the suppressed value, and thus will be indistinguishable from a genuinely empty cell.
Example:
Suppose we have the following records for a given geography:
Record number  Weight  Age 

1  6.5  20 
2  4.9  22 
3  8  25 
4  6.8  26 
5  5.4  27 
6  6.1  27 
7  4.7  27 
8  5.7  29 
9  2.8  32 
10  6.8  36 
11  41.1  39 
12  5  39 
13  81.4  40 
14  5.1  50 
15  3.2  54 
Applying only random rounding, the NHS estimates would be published (randomly rounded) as illustrated in the following table. The number of records would never be published, but is only for illustrating the effect of the rule.
This cell is intentionally left blank.  Age range  

Values  20 to 29  30 to 39  40 to 49  50 to 59  Total 
NHS estimate  50  55  80  10  195 
Number of records  8  4  1  2  15 
Suppression based on cell estimates suppresses both the 40 to 49 and 50 to 59 age ranges, as fewer than four (4) records have the attribute in question, and result in the following table. The total remains unchanged as the total cell represents at least four individuals.
This cell is intentionally left blank.  Age range  

Values  20 to 29  30 to 39  40 to 49  50 to 59  Total 
NHS estimate  50  55  0  0  195 
Number of records  8  4  1  2  15 
The primary intent of the rule is to prevent disclosure of personal information related to certain individuals. Within the example above, if there is only one individual in the area in question who is in the 40 to 49 age range, it is not divulged that they are an NHS respondent, and thereby it minimizes the risk that further information from the NHS on this individual is disclosed.
Footnotes
 Note 1

For more information on standard areas, refer to the 2011 Census Dictionary.