Sampling errors associated with census estimates

1.0 Introduction

This Section discusses the sampling errors associated with the data from the main processing phase of the 1976 Census. Data from the preliminary processing phase is not subject to sampling errors because all schedules were included. Unlike the preliminary data, the final data from the main processing phase is based on the processing of all census schedules from non-private dwellings, all schedules from the Northern Territory and a 50% sample of private dwellings in the other States and the A.C.T. Any estimate for the Northern Territory from either the preliminary or main processing phase is not subject to sampling error since all schedules from the Northern Territory were processed. Counts of the total number of males, total number of females and total number of persons for a CD or LGA from the final processing phase were constrained to agree with those from the preliminary processing phase. Therefore, these estimates of total are not subject to sampling error.

2.0 What is sampling error

Since only a 50% sample of private dwelling schedules was processed, it is likely that the estimates derived from this 50% sample would differ from figures which would have been obtained if all schedules were included. These differences are called sampling errors. The sampling error associated with any estimate can be estimated from the sample results and one measure so derived of this sampling error is the standard error. The particular 50% sample selected was one of a large number of possible 50% samples. Each possible 50% sample would have yielded different estimates and the standard error measures the variation of all the possible 50% sample estimates around the figures which would have been obtained if all schedules had been processed.

Given an estimate and the standard error on that estimate, there are about two chances in three that the sample estimate will differ by less than one standard error from the figure that would have been obtained if all schedules had been processed, and about nineteen chances in twenty that the difference will be less than two standard errors.

Another measure of the sampling error is the relative standard error which is obtained by expressing the standard error as a percentage of the estimate:

Relative Standard Error = ( Standard Error / Estimate ) x 100

Both standard error and relative standard error are used in the following discussion of the reliability of the estimates. An example of their application is as follows:

Example:
If an estimate of 70 has a relative standard error of 10% then the standard error of that estimate is 10% of 70 or 7. Thus there are 2 chances in 3 that the figure that would have been obtained if all schedules had been processed will lie in the range 63 to 77 and about 19 chances in 20 that this figure is between 56 and 84.

3.0 Presentation of sampling errors

It would have been impracticable to publish standard errors of all census estimates because difficulties in presentation would have been encountered with the large number of estimates. In addition, computer production of all standard errors would have been costly.

Consequently, tables which relate the relative standard error of an estimate to the size of the estimate are given at the end of this document. As can be seen from the tables, the larger an estimate, the greater its reliability and thus the smaller the relative standard error. The tables are not intended to give a precise measure of the error for a particular estimate, but provide an indication of the likely magnitude of the relative standard error for estimates of any particular size.

4.0 How to determine the sampling error on an estimate

There is no sampling error on an estimate if:

    (a) the estimate is total males, total females or total persons in a CD, LGAor aggregations of these areas.
    (b) the estimate refers to the Northern Territory
If the estimate is greater than 1000 persons or dwellings, then the relative standard error will be less than 2.5% and so the sampling error can be assumed to be negligible for most practical purposes.

The relative standard error or standard error for any other estimate may be found by reference to the tables given at the end of this document. A complete description of the methods to be used to obtain the relative standard error for any estimate is given in the following sections.

5.0 Sampling errors on dwelling and person estimates

Sampling errors depend on the type of estimate concerned.

    (1) For dwelling estimates the relative standard errors are given by LINE D in TABLE 1

    (2) Sampling errors of person estimates depend on the particular topic of interest. Two groups of topics have been identified:

      Use LINE A in TABLE 1 if your estimate involves any of the following topics:

      Year of arrival in Australia; Birthplace (if overseas); Country of Citizenship (if overseas); Religion; Languages regularly used; Racial Origin; Period of Residence.
      Use LINE B for all other topics related to persons.

This difference between the relative standard errors for different person estimates arises because some characteristics are generally similar for persons in the same dwelling but differ from persons in different dwellings. That is these characteristics are clustered by dwelling (for example, religion and racial origin). The sampling scheme used involved the inclusion of ALL persons in selected dwellings rather than selection of every second person in a dwelling, hence for characteristics which are clustered by dwelling there is a greater chance that such persons would have been either undersampled or oversampled. Thus estimates of number of persons classified by characteristics which are clustered by dwelling will have somewhat higher relative standard errors.

If an estimate is known to include a large number of persons from non private dwellings where all schedules were processed (for example an estimate of males ten to fifteen years of age in a CD with a large boarding school for boys), then the relative standard error as read from the table will overestimate the true relative standard error.

Example:
Consider an estimate of the number of female university graduates in an LGA. The relative standard error will be derived from LINE B of TABLE 1. If the number of female university graduates in the LGA is 50 then reading from this line, the relative standard error is approximately 12%. The standard error on the estimate is 50 X 12/100 = 6. Therefore, there are nineteen chances in twenty that the number of female university graduates in the LGA is in the range 38 to 62.

6.0 Sampling errors on estimates of proportions and percentages

Proportions and percentages formed from the ratio of two Census estimates are also subject to sampling errors and the size of the error depends on the accuracy of both the numerator and the denominator. The formula for the relative standard error of a proportion is given below.

    Relative Standard Error (x/y)
    = SQRT{[ Relative standard error (x)]2 - [Relative Standard Error (y)]2}

The relative standard error on a percentage is the same as for the corresponding proportion. Thus the relative standard error on an estimate of 58% is the same as that on the proportion of 0.58.

Example:
Consider an estimate of the labour force participation rate for persons born overseas for a particular LGA. If the number of persons born overseas who are in the labour force is 100 and the total number of persons born overseas is 160 then the estimated proportion is 100/160 = 0.63. The relative standard errors for both the numerator and denominator will be derived from TABLE 1 LINE A. Reading from this table, the relative standard error of the numerator (ie the number of persons born overseas who are in the labour force) is approximately 13% and the relative standard error of the denominator (ie the number of persons born overseas) is approximately 11%. The relative standard error of the estimate of the proportion is therefore

    SQRT {I3X13 ~11X11} = SQRT [48] = 6.9%

The standard error on the proportion is 6.9 X 0.63/100 = 0.04. Therefore, there are nineteen chances in twenty that the labour force participation rate for persons born overseas for the LGA is in the range 0.55 to 0.71.

As can be seen from the above formula the relative standard error of a proportion will always be less than the relative standard error of the numerator. However, whenever a proportion is small (ie the denominator is considerably greater than the numerator) it will be reasonable to approximate the relative standard error of the proportion by the relative standard error of the numerator.

For proportions or percentages where the denominator is the total number of males, females or persons in a CD or group of CDs, the relative standard error of the denominator is zero because these totals were derived from the preliminary processing phase. In these cases, the relative standard error of the proportion or percentage is given simply by the relative standard error of the numerator.

Example:
Consider an estimate of the percentage of persons born overseas for a particular CD. If the number of persons born overseas in the CD is 300 and the total number of persons in the CD is 1000, then the estimated percentage is (300/1000) X 100 = 30%. The relative standard error on the denominator is zero since estimates of total persons in a CD are not subject to sampling error. The relative standard error on the numerator can be obtained from interpolating TABLE 1 LINE A. This table gives the relative standard error on the numerator as approximately 8.1%. Therefore, the relative standard error on the percentage is also 8.1% and hence the standard error on the estimate of percentage is 8.1 X 30/100 = 2.5 percentage points. Therefore, there are nineteen chances in twenty that the percentage of persons born overseas in the CD will lie within the range 25% to 35%.

Relative standard errors for estimates of proportions or percentages may also be determined from TABLE 2 which sets out relative standard errors for selected percentages or proportions.

7.0 Sampling errors on estimates of differences

The relative standard error on differences between two estimates of numbers or between two estimates of proportions (or percentages) can also be derived from the tables of relative standard errors.

For differences between the 1976 Census and the 1971 Census the standard error of the difference will be identical to the standard error of the 1976 estimate alone, since 1971 estimates are not subject to sample error.

Example:
If estimates for the 1971 and 1976 Censuses are 500 dwellings and 800 dwellings respectively then the difference is estimated as 300 dwellings. The 1971 estimate has no relative standard error whilst the 1976 estimate has a relative standard error (as read from TABLE1 LINE D) of approximately 3% and hence a standard error of approximately 3% of 800 or 24. The standard error of the difference is therefore 24 and there are nineteen chances in twenty that if all schedules from the 1976 Census had been processed that the observed difference would be within the range 252 to 348.

For differences between two 1976 Census estimates the standard error of the differences may be approximated by the following formula.

    Standard Error(x-y) = SQRT {[Standard Error (x)]2 + [Standard Error (y)]2}

This approximation will be exact for differences between estimates of the same characteristics in two different areas (eg LGA's, CD's) or for differences between separate and uncorrelated characteristics in the same area. If, however, there is positive correlation between the characteristics (eg comparison of number of lawyers with number of persons with law degrees), the above approximation will overestimate the true standard error. If there is a negative correlation between the characteristics (eg comparison of the number of persons who travel to work by train and by car) it will underestimate the true standard error.

Example:
If the estimates for two LGAs of the total number of occupied dwellings are 1000 and 800 and the number of occupied dwellings with outer walls of brick are 250 and 650 respectively, then the percentage of occupied dwellings with brick walls in each of these LGAs is (250/1000) X 100 = 25% and (650/800) X 100 = 81.2% respectively. The difference between these estimated percentages is therefore 56.2%. The calculation of the standard error of this difference requires the standard error of each of the percentages to be calculated. The relative standard errors on each of the estimates of numbers (1000, 800, 250 and 650) can be derived from TABLE 1 LINE D and are approximately 3.0, 3.3, 6.0 and 3.7 respectively. Using the formula given in the previous section, the relative standard errors on each of the percentages are:

    SQRT {(6.0)2 - (3.0)2} = 5.2% and SQRT {(3.7)2 - (3.3)2} = 1.7%

The standard errors on each of the percentages are

    5.2 X 25/100 = 1.3 and 1.7 X 81.2/100 = 1.4

Finally the standard error on the difference is

    SQRT {(1.3)2 + (1.4)2} = 1.9 percentage points

Therefore, there are nineteen chances in twenty that the difference between the percentage of occupied dwellings with brick walls in the different LGAs will be within the range 52.4 to 60.0%.

Table 1: Relative Standard Errors of Dwelling and Person Estimates

Estimate


2
5
10
15
20
30
40
50
75
100
500
1000













A-LINE
80
53
38
32
28
23
20
18
15
13
6.4
4.7
B-LINE
62
39
27
22
19
15
13
12
9.6
8.3
3.6
2.5
D-LINE
70
44
31
25
22
18
15
14
11
9.6
4.2
3

Table 2: Relative Standard Errors on Percentages and Proportions

Table 2A: Clustered Person Topics (A-LINE)

Percentage


15
20
30
45
60
75
Denominator






50
40
34
26
19
14
10
100
29
25
19
14
10
7.3
200
21
18
14
10
7.5
5.3
500
14
12
9.1
6.7
5.0
3.5
750
11
9.8
7.6
5.5
4.1
2.9
1000
10
8.6
6.6
4.9
3.6
2.6

Table 2B: Unclustered Person Topics (B-LINE)

Percentage


15
20
30
45
60
75
Denominator






50
29
24
19
13
9.9
7.0
100
20
17
13
9.4
6.9
4.9
200
14
12
9.1
6.6
4.8
3.4
500
8.9
7.5
5.7
4.1
3.0
2.1
750
7.3
6.1
4.6
3.3
2.5
1.7
1000
6.3
5.2
4.0
2.9
2.1
1.5

Table 2C:

Numerator - Clustered Person Topics (A-LINE)

Denominator - Unclustered Person Topics (B-LINE)

Percentage


15
20
30
45
60
75
Denominator






50
42
36
29
24
20
17
100
31
27
22
17
15
13
200
23
19
16
13
11
9.5
500
15
13
11
8.5
7.3
6.4
750
12
11
8.8
7.1
6.1
5.3
1000
11
9.5
7.7
6.3
5.3
4.7

Table 2D: Dwelling Topics (D-LINE)

Percentage


15
20
30
45
60
75
Denominator






50
33
28
21
15
11
7.9
100
23
19
15
11
7.9
5.6
200
16
14
10
7.5
5.5
3.9
500
10
8.6
6.5
4.7
3.5
2.5
750
8.3
7.0
5.3
3.8
2.8
2.0
1000
7.2
6.0
4.6
3.3
2.4
1.7

 

General Enquiries: ada@anu.edu.au
Web Enquiries: webmaster.ada@anu.edu.au