View the most recent version.

## Archived Content

# Appendix C Statistics used in sampling bias study

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

In Chapter 6, it is stated that under random sampling,

should follow an approximately normal (0,1) distribution. A justification for this is given here.
Sampling was done independently in each CU.
Therefore is the sum of *H *independent random variables,
where *H* is the number of CUs in Canada. There are
46,510 CUs in Canada that contain sampled households, so
*H* is very large. Thus, according to the central limit theorem, will
follow an approximately normal (0,1) distribution (see Kendall and Stuart [1963], p. 193)
as will if . , however, would not
have a mean of 0 if the CU level samples of households were
significantly biased for any reason.

An additional statistic will now be derived which allows us to test if the bias between two regions or two censuses is the same. Let and be estimators (based on initial weights) of the known population counts and for two distinct geographic areas or for two different censuses. Let and be the relative biases of and . We wish to test if the null hypothesis is true. This can be done using the statistic

where and are unbiased estimators of and respectively. Thus, if the null hypothesis above is true, the expectation of is zero. Note also that the denominator of is the standard error of the numerator of (there is no covariance term because estimates from separate regions or from different censuses are independent) and hence has a variance of 1. Now if approximately follows a normal distribution (again based on the central limit theorem), will also approximately follow a normal distribution, as will and . Thus, follows approximately a normal (0,1) distribution if the null hypothesis is true.