Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

9.  Sampling variance

Sampling error can be divided into two components: variance and bias. The variance measures the variability of the estimate about its average value in hypothetical repetitions of the survey process, while the bias is defined as the difference between the average value of the estimate in hypothetical repetitions and the true value being estimated. Chapter 6 presented results of the sampling bias, describing the nature and extent of bias in the census sample prior to weighting. Even with a perfectly unbiased sampling method, the results would still be subject to variance, simply because the estimates are based only on a sample. The variance may be estimated using the data collected by the sample survey1. The sampling variance was studied to estimate the effect of the sampling and estimation procedures on those census figures that are based on sample data.

On the basis of the 2B sample data, thousands of tables are produced by Statistics Canada. Conceptually, the estimated sampling variance is a measurement of precision and can be associated with every estimate calculated in these tables. This measurement takes into account both the sample design and the estimation method. In practice, however, it cannot be calculated for every census estimate because of high data processing costs. Sampling variance is thus estimated for only a subset of census estimates. From this, the combined effect of the sample design and the estimation method on the sampling variance can be estimated. Simple estimates of sampling variance, which are inexpensive to calculate, can then be adjusted for this impact to produce estimates of sampling variance for any census estimates.

The square root of the sampling variance, known as the standard error, can be approximated using the data in Tables 9.1 and 9.2. Table 9.1 gives non‑adjusted (simple) standard errors of census sample estimates. The figures in this table were obtained by assuming that 1 in 5 simple random sampling, and simple weighting by 5 were used. The standard errors are expressed in Table 9.1 as a function of the size of both the census estimate and the geographic area. For example, for an estimate of 250 persons in a geographic area with a total of 1,000 persons, the non-adjusted standard error is 25.

Standard errors are given in Table 9.1 for only a limited number of values for the estimated total and the total number of persons, households, dwellings or families in the area. The following formula may be used to calculate the non-adjusted standard errors (NASE) for any estimated total for an area of any size:

where NASE is the non-adjusted standard error, E is the estimated total and N is the total number of persons, households, dwellings or families in the area. For example, for an estimated total of 750 persons in an area with a total of 9,000 persons, the non-adjusted standard error would be:

The following rules should be followed when calculating adjusted standard errors:

• When determining the standard error of an estimate relating to families or households, the number of families or households in the area, not the number of persons, should be used for selecting the appropriate column in Table 9.1.

• Unless otherwise specified, family characteristics involving husband, wife, lone‑parent or family reference person have the same adjustment factors as population characteristics. For example, the adjustment factor for the characteristic 'Highest level of schooling of husband, wife, or lone parent of a census family' is the same as the population characteristic 'Highest level of schooling.'

• For cross-classifications of two or more characteristics, the largest adjustment factor for those characteristics should be used.

• All the standard error adjustment factors are for estimates of the number of persons, households, dwellings, or families, as opposed to, for example, dollar values. For example, the household income adjustment factors are for estimates of the number of households whose income falls in a certain dollar range, and not for estimates such as average household income.

The following example illustrates how to calculate the adjusted standard errors. Suppose the estimate of interest is the number of persons who immigrated to Canada between 1996 and 2006. The 2006 Census estimate for this characteristic was 1,954,605. The 2006 Census count for the population of Canada for sampled variables was 31,241,030. Since neither number is very close to any of the values given in Table 9.1, the formula given to calculate the non-adjusted standard error should be used. In this case the result would be 2,707. From Table 9.2, the national-level adjustment factor for the characteristic 'period of immigration' after 1990 is 1.67. Consequently, the adjusted standard error for this estimate is 2,707 x 1.67 = 4,520.

The sample estimate and its standard error may be used to construct an interval within which the unknown population value is expected to be contained with a prescribed confidence. The particular sample selected in this survey is one of a large number of possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. If intervals from two standard errors below the estimate to two standard errors above the estimate were constructed using each of the different possible estimates, then approximately 19 out of 20 of such intervals would include the value that would normally be obtained in a complete census. Such an interval is called a 95% (19 ÷ 20 = 95%) confidence interval. In order to guarantee 95% confidence however, these intervals must be calculated using the true standard errors of the sample estimates. The adjusted standard errors calculated from Tables 9.1 and 9.2 are only estimates of the true standard errors. For sample estimates at the provincial and national level, however, the adjusted standard errors should be close enough to the true standard errors to calculate approximate 95% confidence intervals of reasonable precision. Below the provincial level, the adjusted standard errors may not be accurate enough for this purpose.

Using the standard error calculated above, an approximate 95% confidence interval for the number of persons who immigrated to Canada between 1996 and 2006 would be 1,954,605 ± 2(4,520) or 1,954,605 ± 9,040.

It should be noted that estimates in small areas can be unreliable, as demonstrated with the following example. A community with a population of 500 persons that had an estimate of 50 for the number of persons who immigrated to Canada between 1996 and 2006 would have a standard error of 15 based on Table 9.1. Since this population is smaller than the provincial level, a WA level adjustment factor must be selected from Table 9.2. Taking the most conservative figure from the 99th percentile would result in an adjusted standard error of 15 x 2.46 = 36.9. This would result in an approximate 95% confidence interval of 50 ± 2(36.9) or 50 ± 73.8. That is to say that the actual population value in this community of persons who immigrated to Canada between 1996 and 2006 could be anywhere in the range from 0 to 123 with 95% confidence. Even a somewhat less conservative figure using the 75th percentile adjustment factor (1.52) results in a 95% confidence interval that ranges from 5 to 95.

Table 9.1  Non-adjusted estimates of standard errors of sample estimates

Table 9.2  Standard error adjustment factors at national or provincial and weighting area levels

Notes:

1. Unfortunately, the sampling variance does not provide any indication of the extent of non-sampling error.
2. The squares of the adjustment factors are commonly known as 'design effects.'
3. For example, '\$10,000 to \$19,999' was one of the categories for which estimates of sampling variance were calculated for the characteristic 'Number of persons in total income intervals.'