biology daily - the biology and biochemistry encyclopedia
biology daily articles and research Encyclopedia Dictionary Forums biology research links Weblinks Pictures Articles Blogs Newsletter

Algorithms for calculating variance

In statistics, a formula for calculating the variance of a population of size n is:

\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n^2}.

A formula for calculating the unbiased estimation of the population variance from n finite samples is:

\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n(n-1)}.

The method of calculation may be more easily understood from the table below where the mean is 8.

i xi xi − mean (xi − mean)2
(index) (datum) (deviation) (squared deviation)
1 5 −3 9
2 7 −1 1
3 8 0 0
4 10 2 4
5 10 2 4
n = 5 sum = 40 0 18
  • mean = 40/5 = 8
  • variance = (5 · 338 − 402)/(5 · 4) = 4.5
  • standard deviation = \sqrt{\mathrm{Variance}} = 2.12

Note: Details of the variance calculation:

338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]

Algorithm

Therefore a simple algorithm to calculate variance can be described by the following pseudocode:

long n = 0;
double sum = 0;
double sum_sqr = 0;
double variance;

foreach x in data:
  n += 1;
  sum += x;
  sum_sqr += x*x;
end for

variance = (sum_sqr - sum*sum/n)/(n-1);

Algorithm

Another algorithm which avoids large numbers in sum_sqr while summing up

double avg = 0;
double var = 0;
long n = data.length; // number of elements

for i = 1 to n
 avg = (avg*i + data[i]) / (i + 1);
 var = (var * (i - 1) + (data[i] - avg)*(data[i] - avg)) / i;
end for

return var; // resulting variance


07-14-2008 23:18:10
The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License. How to see transparent copy
BiologyDaily.com 2005. Legal info   Privacy