Using Statistics in Delphi - Part III
This Issue we continue
our several issue look at developing Statistical Routines to use
in your Delphi Applications. These will be designed to use
Open Array Parameters where possible so that you can use them for
Standard Arrays or with the new Dynamic Arrays.
Quartiles
Just as the Median
- see Part I - measured the middle, thus dividing the Data
in half, Quartiles divide the Data in Quarters.
So 2nd Quartile
= Median
Now whilst most people
agree on how Quartiles are to be calculated when it comes
to continuous functions, when we have a collection of Data (also
called Discrete Data) there is some debate over the best
way.
The First Quartile
is any value such that
25% of the Data Values are less than or equal to it, and 75% of
the Data Values are greater than or equal to it. Thus in fact for
our sort of Data, the Quartile is not unique but lies within a range.
Similarly, the Third
Quartile is any value
such that 75% of the Data Values are less than or equal to it and
25% of the Data Value are greater than or equal to it.
I am going to stick
with the way I was taught, which coincides with the method used
by Leonard J Kazmier in Schaum's Outline Series: Theory
and Problems of Business Statistics, published by McGraw-Hill.
Please note that this gives slightly different answers then those
supplied by Microsoft Excel, but it is pretty easy to implement.
For n Data
Items:

That is the position of the First
Quartile is at position n / 4 + 0.5
If this is an Integer
then we use that Data Value. Otherwise, we take the integer portion,
I, and the Data Value at that position and add to it the fractional
portion multiplied by the difference between the I and I + 1 Data
Values.
Similarly:

That is the position
of the Third Quartile is at position 3 * n / 4 + 0.5
This gives us the
following Delphi Procedure to calculate Quartiles, given
that we have a sorted array (see discussion in Stats Part I on Median):
procedure GetQuartiles
(const SortedX: array of Extended;
var Q1, Q3: Extended);
// Returns the 1st and 3rd Quartile
// Note: Assumes Array starts at 0 and ends at n-1
var
J: Single;
I: Integer;
begin
if High (SortedX) < 0 then
raise Exception.Create ('Array is Empty!')
else if High (SortedX) = 0 then
begin
Q1 := SortedX [0];
Q3 := SortedX [0];
end
else
begin
//Calculate 1st Quartile
J := (High (SortedX) + 1) / 4 + 0.5;
I := Trunc (J);
J := Frac (J);
if I - 1 < High (SortedX) then
Q1 := SortedX [I - 1] + (SortedX
[I]
- SortedX [I - 1]) *
J
else // Take End
Value
Q1 := SortedX [I - 1];
//Calculate 3rd Quartile
J := 3 * (High (SortedX) + 1) / 4 + 0.5;
I := Trunc (J);
J := Frac (J);
if I - 1 < High (SortedX) then
Q3 := SortedX [I - 1] + (SortedX
[I]
- SortedX [I - 1]) *
J
else // Take End
Value
Q3 := SortedX [I - 1];
end;
end;
Calculating the
Inter-Quartile Range
Since the Range is
easily effected by extreme outliers, many people use the IQR,
Inter-Quartile Range, as a measure of dispersion, since it
contains 50% of the values.
Once we have calculated
the First and Third Quartile, the IQR is simply:
IQR := Q3 - Q1;
Calculating the
Coefficient of Variation
The
Coefficient of Variation gives us information about the Standard
Deviation relative to the mean. Thus it could be thought
of as the magnitude of the Standard Deviation. This
is known as a measurement of Relative Dispersion.
It
is calculated as follows:

Which
in Delphi translates to:
if FloatIsZero
(Mean) then //
See Article on Rounding
raise Exception.Create ('Value does not Exist')
else
CoeffVariation := StdDev / Mean;
Quartile Coefficient
of Variation
Quartile Coefficient
of Variation is another common measurement of Relative Dispersion,
and is quite easy to calculate once we have the Quartiles.
It is calculated as follows:

which in Delphi translates
to:
if
FloatIsZero (Q3 + Q1) then // See Article
on Rounding
raise Exception.Create ('Value does not Exist')
else
QCoeffVariation := (Q3 - Q1) / (Q3 + Q1);
Stating your Methodology
Given that different
people and different packages calculate Quartiles in different
ways - most of which can be justified - you should start to see
that it is important to state which methods you are using when you
use Mathematical and Statistical routines. We have also seen that
a Sample Standard Deviation is calculated differently to
a Population Standard Deviation - though some texts just
used the Population formula for both.
It is good practice
to list all Mathematical Assumptions, Formulae and Techniques used
within your Application e.g. make this a section in your Help File.
Conclusion
Next Issue we will
continue are look at Statistics as we look at Normal Distributions.
|