ADUG
Home
About Us
Services
Meetings
Fees
Mailing List
Rules
Reference Papers
Downloads
Apply to Join
Links
Delphi Jobs
Special Offers
Maths Corner

 

by Glenn Crouch

 

Using Statistics in Delphi - Part I

This Issue we begin a several issue look at developing Statistical Routines to use in your Delphi Applications.  These will be designed to use Open Array Parameters where possible so that you can use them for Standard Arrays or with the new Dynamic Arrays.

Double & Extended

Whilst the Delphi Math unit supplies some nice Statistical Routines, they tend to rely on arrays of Double, whereas I tend to keep things in Extended. I've always liked working to as many decimal places as possible and then rounding to the required number of decimal places as the last step. HOWEVER, Extended are normally slower than Double (though this could change with the advent of 64-bit Architectures).

So we are going to develop all our routines using Extended - though they could be easily adapted to Double or Single routines - or even nicely overloaded with Delphi 4!

Whilst our routines will be using Extended, please note that the accuracy of the results will depend upon the accuracy of the original data.

Measures of Central Tendency

These are Statistics that summarise data by giving us a single value that tells us something about the "middle" of the data. We use this concept frequently:

- Average Height
- Most Popular Film
- Average Intelligence

There are 3 common measurements: Mean, Median & Mode.

Calculating the Mean

Often when we use the term "average" we are referring to the Mean. Since "average" is an imprecise term in English usage it is normally avoided by Statisticians. Since there is also more than one type of Mean, strictly speaking we are calculating the Arithmetic Mean. However since Statisticians don't use the Geometric Mean or the Harmonic Mean very often, it is normally assumed that by Mean we imply the Arithmetic Mean.

Simply put, the Mean is the Sum of the Values divided by the Number of the Values.

The "funny" big symbol in the above formulae is the Capital Greek Letter Sigma, which is used as Mathematical Shorthand for "Sum All The Values".

Notice that though we use the same formula, we use a Greek Letter to indicate that the Statistic comes from the whole population rather than just a sample. In practice, we tend to work with samples more than populations.

Now to convert this to Delphi:

function SumEArray (const B: array of Extended): Extended;
// Returns the Sum of an Array of Extended
var
  I: Integer;
begin
  Result := B [Low (B)];
  for I := Low (B) + 1 to High (B) do
    Result := Result + B [I];
end;

function ESBMean (const X: array of Extended): Extended;
// Returns the Arithmetic Mean of an Array of Extended
begin
  if High (X) < 0 then
    raise Exception.Create ('Array is Empty!')
  Result := SumEArray (X) / (High (X) - Low (X) + 1)
end;

Calculating the Mode

When we use terms like "Most Popular" then we are in fact referring to the Mode. It is the most common value.

Unlike Mean, not all Data has a Mode - since you may encounter 2 values to be equally popular (i.e. Bimodal) or every value may be unique. This value is not as useful as the Mean. But when it does exist, it indicates that there is grouping in the data - we will also be interested in cases when there is significant difference between the Mean and the Mode.

To calculate the Mode, we need our array to be sorted. Rather than include a sort routine in the Mode calculation, we will leave that up to the user. Though this is risky, sorting wastes a lot of time - and many routines depend on a sorted array. There are many good sort algorithms available and you can use what ever best suits your needs.

The following uses SameFloat from the Rounding Article.

function GetMode (const SortedX: array of Extended;
                   var Mode: Extended): Boolean;
// Calculates the Mode of a Sorted Array of Extended and returns
// True if the Mode exists.
var
  I, Freq, HiFreq: Integer;
  Matched: Boolean;
begin
  if High (SortedX) < 0 then
    raise Exception.Create ('Array is Empty!')
  else if High (SortedX) = 0 then // Only a Single Value
  begin
    Mode := SortedX [0];
    Result := True;
  end
  else
  begin
    Mode := 0;
    Freq := 1; // Frequency of current Value
    HiFreq := 0; // Highest Frequency so far
    Matched := False; // If False HiFreq is Unique

    for I := 1 to High (SortedX) do
    begin
      if SameFloat (SortedX [I - 1], SortedX [I]) then
        Inc (Freq) // count the number of values
      else
      begin
        if Freq <> 1 then // now see if frequency is highest
        begin
          if Freq = HiFreq then // not unique
            Matched := True
          else if Freq > HiFreq then // new HiFreq
          begin
            Mode := SortedX [I - 1];
            HiFreq := Freq;
            Matched := False;
          end;
          Freq := 1;
        end;
      end;
    end;

    //Handle special End cases
    if HiFreq > 0 then // Last value might be HiFreq
    begin
      if Freq = HiFreq then
        Matched := True
      else if Freq > HiFreq then
      begin
        Mode := SortedX [High (SortedX)];
        Matched := False;
      end;
    end
    else if Freq > 1 then // All values identical
    begin
      HiFreq := Freq;
      Mode := SortedX [0];
      Matched := False;
    end;
    // Mode exists if HiFreq is Unique
    Result := (HiFreq > 0) and not Matched;
  end;
end;

Median

The Median is a way of measuring the exact middle when the values are listed in sorted order.

So if the number of values is odd we take the middle value (eg 11 values then we take value 6 as there are 5 lower values and 5 higher values). If it is even we take the middle two values and find their Mean.

Like Mean, Median has the advantage of always existing for numerical data. Like Mode, Median requires the Array to be sorted.

function GetMedian (const SortedX: array of Extended): Extended;
// Returns the Median for a Sorted Array of Extended.
var
  N: Integer;
begin
  N := High (SortedX) + 1;
  if N <= 0 then
    raise Exception.Create ('Array is Empty!')
  else if N = 1 then // Only a Single Value
    Result := SortedX [0]
  else if Odd (N) then // Handle Odd Number of Values
    Result := SortedX [N div 2]
  else // Handle Even Number of Values
    Result := (SortedX [N div 2 - 1] + SortedX [N div 2]) / 2;
end;

Conclusion

Next Issue we will continue are look at Statistics as we look at measures of Dispersion such as Standard Deviation and Variance.

 

Maths Corner Home

 Copyright © 2001 Australian Delphi User Group and respective copyright owners.
All Rights Reserved | Disclaimer