ADUG
Home
About Us
Services
Meetings
Fees
Mailing List
Rules
Reference Papers
Downloads
Apply to Join
Links
Delphi Jobs
Special Offers
Maths Corner

 

by Glenn Crouch

 

Using Statistics in Delphi - Part II

This Issue we continue our several issue look at developing Statistical Routines to use in your Delphi Applications.  These will be designed to use Open Array Parameters where possible so that you can use them for Standard Arrays or with the new Dynamic Arrays. 

Measures of Dispersion

These are Statistics that describe the spread of the data in a single value. We use this concept frequently in terms such as:

- Salary Range
- Within acceptable parameters
- Low variance in yield

There are 3 common measurements: Range, Variance & Standard Deviation.

Calculating the Range

Range is the difference of the Largest and Smallest values in the Data. Often it is given as a single value, though some prefer to express the range in a form like [Smallest, Largest].

If we already have a Sorted Array for the Data, then the Range is easy to find: subtract the first value from the last value. If not then we need routines to find the largest and smallest values:

function MaxEArray (const B: array of Extended): Extended;
// Returns the Maximum value in an Extended Array
var
  I: Integer;
begin
  Result := B [Low (B)];
  for I := Low (B) + 1 to High (B) do
    if B [I] > Result then
      Result := B [I];
end;

function MinEArray (const B: array of Extended): Extended;
// Returns the Minimum value in an Extended Array
var
  I: Integer;
begin
  Result := B [Low (B)];
  for I := Low (B) + 1 to High (B) do
    if B [I] < Result then
      Result := B [I];
end;

Calculating the Variance

Assuming that we have the mean, we can measure dispersion by examining how each value in the data list varies from the mean

However, if we just sum up the differences between the values and the mean, the positive and negative values would cancel each other out.

So the Variance, is the "average" of the square of the difference of the values from the mean. In Mathematical terms:

Note that for Samples, we divide by (n - 1), this is so that the Sample Variance is an unbiased estimator of the Population Variance (no need to worry too much about what this means at this stage). As mentioned last month, Greek Letters are used for Population Statistics and English Letters are used for Sample Statistics.

function SampleVariance (const X: array of Extended): Extended;
// Returns the Sample Variance for an Extended Array
var
  I: Integer;
  SumSq: Extended; // Sum of Squares of Difference
  Mean: Extended;
begin
  Mean := ESBMean (X); // Supplied Last Month
  SumSq := 0.0;
  for I := Low (X) to High (X) do
    SumSq := SumSq + Sqr (X [I] - Mean);
  Result := SumSq / (High (X) - Low (X))
end;

function PopulationVariance (const X: array of Extended): Extended;
// Returns the Population Variance for an Extended Array
var
  I: Integer;
  SumSq: Extended; // Sum of Squares of Difference
  Mean: Extended;
begin
  Mean := ESBMean (X); // Supplied Last Month
  SumSq := 0.0;
  for I := Low (X) to High (X) do
    SumSq := SumSq + Sqr (X [I] - Mean);
  Result := SumSq / (High (X) - Low (X) + 1)
end;

Notice that the Mean is calculated inside the Variance calculation. Normally we want both the Variance and the Mean, so to save calling Mean more than once, we can use the following Routines:

function SampleVarianceAndMean (const X: array of Extended;
  var Mean: Extended): Extended;
// Returns the Sample Variance & Mean for an Extended Array
var
  I: Integer;
  SumSq: Extended; // Sum of Squares of Difference
begin
  Mean := ESBMean (X); // Supplied Last Month
  SumSq := 0.0;
  for I := Low (X) to High (X) do
    SumSq := SumSq + Sqr (X [I] - Mean);
  Result := SumSq / (High (X) - Low (X))
end;

function PopulationVarianceAndMean (const X: array of Extended;
  var Mean: Extended): Extended;
// Returns the Population Variance & Mean for an Extended Array
var
  I: Integer;
  SumSq: Extended; // Sum of Squares of Difference
  Mean: Extended;
begin
  Mean := ESBMean (X); // Supplied Last Month
  SumSq := 0.0;
  for I := Low (X) to High (X) do
    SumSq := SumSq + Sqr (X [I] - Mean);
  Result := SumSq / (High (X) - Low (X) + 1)
end;

Standard Deviation

Whilst the Mean is in the same units as the data (i.e. if the data was in kg so is the mean), the Variance is in units squared, which can cause some problems: e.g. what is a square kilogram?

Standard Deviation is the positive Square Root of the Variance:

Rather than write specialised routines to get the Standard Deviation, we can just use the Delphi SQRT function on the results from our previous routines.

Conclusion

Next Issue we will continue our look at Statistics as we look at measures such as Quartiles, IQR and the Coefficient of Variation.

 

Maths Corner Home

 Copyright © 2001 Australian Delphi User Group and respective copyright owners.
All Rights Reserved | Disclaimer