|
The Melbourne Sub Committee of ADUG is running a programming
competition open to ADUG members in all states. The aim of the
competition is to produce a reasonably efficient piece of code which
counts all occurrences of any give word in a text file containing a
piece of English text up to 1 megabytes in size.
The results will be assessed on accuracy, speed and code style as
follows.
If the solution offered does not return the correct number of
occurrences in any one of a series of test cases it will be rejected
as inaccurate.
Solutions which take longer than four times the time taken by the
fastest accurate solution averaged over a series of tests will be
rejected as too inefficient.
The code of all accurate, efficient solutions will be perused by a
select committee of eminent judges and graded according to their
views of neat, well structured code. The judges' decision is final and no
correspondence will be entered into.
Competition
Specification
1 The solution code must
offer a single interface
The solution code must offer a single interface of "function
FindWordInFile(const Filename, FindWord: string): Integer;" in
a unit named "ADUGStringInFile"
For Example
unit ADUGStringInFile;
interface
function FindWordInFile(const Filename, FindWord: string): Integer;
implementation
...
2 All code must be contained
in that unit file.
All solution code must be contained in the unit file. The solution
can "Use" any files provided with Delphi professional
releases but not third party code.
Only Delphi code can be used in the submission. In line assembler is
not permitted.
3 Parameters
FileName may be either a fully specified path or file referenced
from the current directory. The code must handle both.
FindWord may be any combination of [a..z] and or [A..Z] of length 1
to 45
Result is the count of occurrences.
4 Test File
The test file to be used will be limited to 1 Megabytes but can
contain any valid ASCII Characters (Char(0) to
Char(127)).
5 Definition of a Word Match
The function result must return the count of the whole words of
FindWord in that file. A whole word is defined as any case
insensitive match both preceded by and followed by any character
outside those specified in the parameters section as valid in
FindWord. That is, if it is not an alpha, it is a word separator. So a
search for ROGER should count Roger's but not count Rogers. The
actual start of file and end of file will also be considered as word
boundaries.
Submissions
All submissions should be emailed to Richard
King after 31 July 2003 and before the closing time of 6:00 pm
on the 19 August 2003. We expect results to be announced at the
September ADUG meeting. The judges' justification may be some
"long" time later.
Submissions should be attached to an email Titled "ADUG String
Find Entry From ." The body of the email should
provide your name email address and, if you are an ADUG Member, your
ADUG Membership number.
Ideally the attachment should be zipped but must include only the
single .pas unit. Any entries including sample 1MB text files and/or
compiled exe's will be disqualified!
Testing Environment
Entries will be compiled in D7 Professional and run under Windows
(XP). This should make no difference to submissions as any version
of Delphi can be expected to compile an run in this environment.
Prizes
We were only doing this for fun. One prize only to the overall
best entry as judged by the judges: A signed copy of Danny Thorpe's
excellent and now out of print "Delphi Component Design"
for ADUG memebers or an ADUG membership for non-ADUG members.
Hints
Preliminary investigation for this topic yielded the fast word
search algorithm "Boyer-Moore" implemented in Pascal at http://www.bsdg.org/swag/STRINGS/0138.PAS.html
you are welcome to use the Pascal implementation of this code but it
will need modification to meet the specification and I am not sure
about the judges' views on the neat well structured test.
The file size has been limited to 1 Megabyte to enable the buffer
required for an implementation similar to the
"Boyer-Moore" code above to be allocated on the stack
without the added complexity of handling multiple reads.
Eg.
var Buffer: array[1.. 1000000] of char;
begin
ThisFile := TFileStream.Create(FileName, OpenMode);
BufCount := ThisFile.Read(Buffer, 1000000);
However better solutions which enable larger files to be handled
with indications as to the maximum limits supported may be favorably
considered.
A big thank you to Roger Connell for the
content of this page and for providing the initial testing and
timing framework. |