Using Statistics in Delphi - Part II
This Issue we continue
our several issue look at developing Statistical Routines to use
in your Delphi Applications. These will be designed to use
Open Array Parameters where possible so that you can use them for
Standard Arrays or with the new Dynamic Arrays.
Measures of Dispersion
These are Statistics
that describe the spread of the data in a single value. We use this
concept frequently in terms such as:
- Salary Range
- Within acceptable parameters
- Low variance in yield
There are 3 common
measurements: Range, Variance & Standard Deviation.
Calculating the
Range
Range is the difference
of the Largest and Smallest values in the Data. Often it is given
as a single value, though some prefer to express the range in a
form like [Smallest, Largest].
If we already have
a Sorted Array for the Data, then the Range is easy to find: subtract
the first value from the last value. If not then we need routines
to find the largest and smallest values:
function MaxEArray (const
B: array of Extended): Extended;
// Returns the Maximum value in an Extended
Array
var
I: Integer;
begin
Result := B [Low (B)];
for I := Low (B) + 1 to High (B) do
if B [I] > Result then
Result := B [I];
end;
function MinEArray (const B: array of Extended):
Extended;
// Returns the Minimum value in an Extended
Array
var
I: Integer;
begin
Result := B [Low (B)];
for I := Low (B) + 1 to High (B) do
if B [I] < Result then
Result := B [I];
end;
Calculating the
Variance
Assuming that we
have the mean, we can measure dispersion by examining
how each value in the data list varies from the mean.
However, if we just
sum up the differences between the values and the mean, the
positive and negative values would cancel each
other out.
So the Variance,
is the "average" of the square of the
difference of the values from the mean. In Mathematical terms:

Note that for Samples, we divide by (n - 1), this is so that the
Sample Variance is an unbiased estimator of the Population Variance
(no need to worry too much about what this means at this stage).
As mentioned last month, Greek Letters are used for Population Statistics
and English Letters are used for Sample Statistics.
function SampleVariance
(const X: array of Extended): Extended;
// Returns the Sample Variance for an Extended
Array
var
I: Integer;
SumSq: Extended; // Sum of Squares
of Difference
Mean: Extended;
begin
Mean := ESBMean (X); // Supplied Last
Month
SumSq := 0.0;
for I := Low (X) to High (X) do
SumSq := SumSq + Sqr (X [I] - Mean);
Result := SumSq / (High (X) - Low (X))
end;
function PopulationVariance (const X: array of
Extended): Extended;
// Returns the Population Variance for an
Extended Array
var
I: Integer;
SumSq: Extended; // Sum of Squares
of Difference
Mean: Extended;
begin
Mean := ESBMean (X); // Supplied Last
Month
SumSq := 0.0;
for I := Low (X) to High (X) do
SumSq := SumSq + Sqr (X [I] - Mean);
Result := SumSq / (High (X) - Low (X) + 1)
end;
Notice that the Mean
is calculated inside the Variance calculation. Normally we want
both the Variance and the Mean, so to save calling Mean more than
once, we can use the following Routines:
function SampleVarianceAndMean
(const X: array of Extended;
var Mean: Extended): Extended;
// Returns the Sample Variance & Mean
for an Extended Array
var
I: Integer;
SumSq: Extended; // Sum of Squares
of Difference
begin
Mean := ESBMean (X); // Supplied Last
Month
SumSq := 0.0;
for I := Low (X) to High (X) do
SumSq := SumSq + Sqr (X [I] - Mean);
Result := SumSq / (High (X) - Low (X))
end;
function PopulationVarianceAndMean (const X: array
of Extended;
var Mean: Extended): Extended;
// Returns the Population Variance & Mean
for an Extended Array
var
I: Integer;
SumSq: Extended; // Sum of Squares
of Difference
Mean: Extended;
begin
Mean := ESBMean (X); // Supplied Last
Month
SumSq := 0.0;
for I := Low (X) to High (X) do
SumSq := SumSq + Sqr (X [I] - Mean);
Result := SumSq / (High (X) - Low (X) + 1)
end;
Standard Deviation
Whilst the Mean
is in the same units as the data (i.e. if the data was in
kg so is the mean), the Variance is in units squared,
which can cause some problems: e.g. what is a square kilogram?
Standard Deviation
is the positive Square Root of the Variance:

Rather than write
specialised routines to get the Standard Deviation, we can just
use the Delphi SQRT function on the results from our previous routines.
Conclusion
Next Issue we will
continue our look at Statistics as we look at measures such as Quartiles,
IQR and the Coefficient of Variation.
|