Contributed by Marc Salmon
The maximum probable error (MPE) method can be used to determine the number of analytical samples[1] required to estimate the arithmetic mean of a population (µ). This method has been referred to in the assessment of site contamination (ASC) literature since the mid-1980s (Provost 1984, USEPA 1985, and Gilbert 1987).
As noted by Provost (1984), “One measure of the quality of an estimate of an average is the confidence limits (or maximum probable error) for the estimate”; with quality in this sense referring to the representativeness and precision of the estimate in relation to the population parameter. The MPE method assumes independent samples from a nearly-normal distribution, although for populations that are not normally distributed, the method provides an approximation that improves as n increases (Provost 1984).
Using this method, the number of analytical samples required is controlled by the confidence level that is selected, the variability of the underlying population, and the desired precision for the decision. Once a confidence level is chosen, as the variability of any population is generally a fixed feature of that population, then only the number of analytical samples is able to be changed to provide more precise confidence intervals. Noting that variability can be reduced in sample data by using stratified sampling designs or by the grouping of sample data as like with like when conducting data analysis and interpretation.
In the ASC, while the confidence level is traditionally set at 95%, this can be adjusted assuming appropriate rationale and justification. Where less certainty is required such as in scoping studies, preliminary investigations and some estimation problems, 80% to 90% may suffice. Where significant and costly decisions are required, confidence levels of 95% to 99% are generally more appropriate.
Margin of error
In the reporting of sample results, it is desirable to also quantify the uncertainty associated with any point estimates as the width of an uncertainty band or interval. In survey and polling data, this is commonly expressed as the margin of error (MOE). For example, a poll for the 2016 US presidential election had a 3-point MOE, so while the “point estimate” for Trump was 43%, the pollsters estimated that support for Trump in the total population was between 40% and 46% (43% ± 3%). For Clinton at 42%, support was estimated as from 39% to 45% (42% ± 3%). The point estimates and associated MOEs which make up the intervals are illustrated in Figure 1.
The MOE is calculated as the product of the critical value (z or t) at the desired confidence level and the variability of the sample data, as standard deviation (s) divided by the square root of the number of samples (√n). This is shown in Equation 1.
The function s/√n is known as the standard error of the mean (SEx ) and describes the variability in the sampling distribution, i.e. the distribution of means from multiple sampling events of the same population, not the variability in the underlying population. One of the key features of the SEx is that it decreases as the sample size increases (Devore and Farnum 2005), and consequently so does the MOE.
Confidence intervals
For environmental data, the MOE is more commonly expressed as the confidence interval, where the MOE can be thought of as the “radius” to, or half the width of, the diameter of the interval. Confidence intervals are constructed as the range between the sample mean minus the MOE and the mean plus the MOE, as shown in Equations 2 and 3.
Alternatively, for estimation problems as defined in the data quality objectives (DQOs) process[2], an acceptable width of the confidence interval can be specified to provide some measure as to the required precision or statistical quality of the data. In these instances, the confidence interval is constructed as a two-sided test (1 – alpha/2), as for the determination of the MOE.
MPE Method
Equation 1 can be rearranged to give the number of independent analytical samples required to estimate the arithmetic mean for a specified MOE, as shown in Equation 4.
As the variability of the sample data is required to determine n using this method, either existing site data, data from similar studies, or from a small-scale scoping study should be used as an estimate of the sample standard deviation. Following any investigation, the sample data should be used to confirm the actual variability.
Conveniently, the MOE and s can be standardised as relative values by dividing by the mean, giving the MPE (MOE/mean) and the relative standard deviation (RSD) (s/mean), which is also known as the coefficient of variation (CV). Either method can be used, as long as the variables are consistent, i.e. MOE and s in mg/kg or MPE and RSD in %, noting that units are not used and no conversion is required. The standardised MPE method is calculated as shown in Equation 5.
As Equations 4 and 5 reduce to n = n, as shown in Equation 6, the MPE method cannot be used directly to determine retrospectively if sufficient analytical samples were analysed; i.e. the equation solves to the number of samples collected.
Rather, as part of the sampling design, it can be used to estimate the number of samples required, based on the assumed variability of the sample data, the selected confidence level and the desired precision of the data. The resulting sample data can then be used to determine the MPE achieved for the sample data RSD, and to decide if additional analyses are required to achieve a more precise MPE.
If the data shows too large an MPE (> 35% – 50%) for reasonable RSDs (< 120%), then the data is probably not sufficiently precise. The determination of an appropriate MPE should be based on the decision required and the site-specific CSM and DQOs.
Table 1 shows the number of samples required to estimate the arithmetic mean based on the MPE method for various MPEs and RSDs. The data in Table 1 was developed using ProUCL statistical software[3].
Approximation only
It should be stressed that the determination, either using the information herein or ProUCL, is for an approximate minimum sample size in media that are not too heterogeneous[4], and these should not be considered as definitive. Rather, determination of site or decision specific sample sizes should be based on robust CSMs, using a weight of evidence[5] approach. For example, the NEPM (2013, B2) describes that:
Determining grid size/sampling density from mathematical formulae (for example, Appendix D of Standard AS 4482.1-2005) is not an acceptable approach without consideration of likely contaminant distribution and acceptable hotspot size. |
USEPA (2015) recommends a minimum of 8 to 10 samples, as for smaller sample sizes the critical values (t) are large and unstable. For fewer than that recommended number of samples, the confidence intervals are mainly driven by those critical values.
References
Gilbert R.O. (1987) Statistical Methods for Environmental Pollution Monitoring, John Wiley & Sons Inc., Brisbane.
Devore J. and Farnum N. (2005) Applied Statistics for Engineers and Scientists, 2nd Edition, Brooks/Cole, Cengage Learning, Belmont, CA.
Provost L.P. (1984) Statistical Methods in Environmental Sampling, in Schweitzer G. E. and Santolucito J. A. (Eds.) Environmental Sampling for Hazardous Wastes, American Chemical Society, Washington D.C.
United States Environmental Protection Agency (USEPA) (1985) Characterization of Hazardous Waste Sites – A Methods Manual; Volume I – Site Investigations, (Ref. EPA/600/4-84/075).
[1] Analytical samples are those that are subjected to quantitative or semi-quantitative analysis; field samples are those subjected to visual and olfactory observations, descriptions, and field logging, which can be field-screened and then subject to other non-laboratory assessments and tests. An analytical sample is a field sample, but a field sample may not necessarily be an analytical sample.
[2] Reference to: Design of Assessment of Site Contamination Investigations
[3] (https://www.epa.gov/land-research/proucl-software)
[4] Although somewhat subjective, for statistical analysis only “similar materials” should be grouped, as by definition the objective is to estimate population parameters; if the material grouped is not from a logical population, then the estimates may be meaningless. A greater number of samples may be required when there is a large range in contaminant concentrations or material types.
[5]Weight of evidence describes the process to collect, analyse and evaluate a combination of different qualitative, semi-quantitative and quantitative lines of evidence to make an overall assessment of contamination. Applying a weight of evidence process incorporates judgements about the quality, quantity, relevance and congruence of the data contained in the different lines of evidence (ANZG 2018); all of which need to be synthesised into robust ASC conclusions.