Measuring dispersion

Here, the objective is to quantify the spread of the data about the centre of the distribution. The principal measures of dispersion are the range, variance, standard deviation and coefficient of variation.

Range
The range is the difference between the largest and smallest data values in the sample (the extremes) and has the same units as the measured variable. The range is easy to determine, but is greatly affected by outliers. Its value may also depend on sample size: in general, the larger this is, the greater will be the range. These features make the range a poor measure of dispersion for many practical purposes.

Variance and standard deviation
For symmetrical frequency distributions of quantitative data, an ideal measure of dispersion would take into account each value's deviation from the mean and provide a measure of the average deviation from the mean. Two such statistics are the sample variance, which is the sum of squared deviations from the mean (∑ (YŸ (Y bar))2) divided by n − 1 (where n is the number of data values), and the sample standard deviation, which is the positive square root of the sample variance.

The sample variance (s2) has units which are the square of the original units, while the sample standard deviation (s) is expressed in the original units, one reason s is often preferred as a measure of dispersion. Calculating s or s2 longhand is a tedious job and is best done with the help of a calculator or computer. If you don't have a calculator that calculates s for you, an alternative formula that simplifies calculations is:

⇒ Equation [40.1]
  s = + Y2 − (∑ Y)2/n  
n −; 1

To calculate s using a calculator:
  1. Obtain ∑ Y, square it, divide by n and store in memory.
  2. Square Y values, obtain ∑ Y2, subtract memory value from this.
  3. Divide this answer by n − 1.
  4. Take the positive square root of this value.
Take care to retain significant figures, or errors in the final value of swill result. If continuous data have been grouped into classes, the class mid-values or their squares must be multiplied by the appropriate frequencies before summation. When data values are large, longhand calculations can be simplified by coding the data, e.g. by subtracting a constant from each datum, and decoding when the simplified calculations are complete.

Coefficient of variation
The coefficient of variation (CoV) is a dimensionless measure of variability relative to location which expresses the sample standard deviation as a percentage of the sample mean, i.e:

⇒ Equation [40.2] CoV = 100s/Ÿ (Y bar) (%)

This statistic is useful when comparing the relative dispersion of data sets with widely differing means or where different units have been used for the same or similar quantities.

A useful application of the CoV is to compare different analytical methods or procedures, so that you can decide which involves the least proportional error − create a standard stock solution, then compare the results from several sub-samples analysed by each method. You may find it useful to use the CoV to compare the precision of your own results with those of a manufacturer, e.g. for an autopipettor. The smaller the CoV, the more precise (repeatable) is the apparatus or technique (note: this does not mean that it is necessarily more accurate).