Board index » delphi » Newbie - Sorting Files

Newbie - Sorting Files

I have a file directory on a NT Server with ~ 70K database application
transaction files whcih I need to summarize and produce a variety of
information from (total number of files, # of files per user, oldest file,
oldest file per user, etc.). Filename structure is transact_num.userid.

My initial thought is to write a routine that  loops through the directory
and then reads parsed info (filename, userid, file createdate, info, etc.)
into a database table for sorting and grouping.

For simplicify sake, I would rather not use an external database to
accomplish this, but seems like using something like a dynamic array would
consume too much memory / system resources -- although this approach might
permit faster operations. Right?

Any thoughts / ideas on the best approach to this application are greatly
appreciated.

 

Re:Newbie - Sorting Files


Hi Kevin,

There are a lot of search algorithms you could use. Like bubblesort,
quicksort, insertion sort etc. The algorithms can be used onto fixed length
array's, But if you want to sort undefined length array's use a linked list
algorithm like linked list bucket sort. Check out the internet to get some
sources.
What you also can do is to put the data into a listbox and put the sorted
property to true.

Good luck, Darius Blaszijk

P.S. If you still experience difficulties with it, post again.

Quote
"Kevin Hourihane" <khourih...@hotmail.com> wrote in message

news:9bhi4o$e9r$1@bob.news.rcn.net...
Quote
> I have a file directory on a NT Server with ~ 70K database application
> transaction files whcih I need to summarize and produce a variety of
> information from (total number of files, # of files per user, oldest file,
> oldest file per user, etc.). Filename structure is transact_num.userid.

> My initial thought is to write a routine that  loops through the directory
> and then reads parsed info (filename, userid, file createdate, info, etc.)
> into a database table for sorting and grouping.

> For simplicify sake, I would rather not use an external database to
> accomplish this, but seems like using something like a dynamic array would
> consume too much memory / system resources -- although this approach might
> permit faster operations. Right?

> Any thoughts / ideas on the best approach to this application are greatly
> appreciated.

Re:Newbie - Sorting Files


Quote
"Kevin Hourihane" <khourih...@hotmail.com> wrote in message

news:9bhi4o$e9r$1@bob.news.rcn.net...

Quote
> I have a file directory on a NT Server with ~ 70K database application
> transaction files whcih I need to summarize and produce a variety of
> information from (total number of files, # of files per user, oldest file,
> oldest file per user, etc.). Filename structure is transact_num.userid.

> My initial thought is to write a routine that  loops through the directory
> and then reads parsed info (filename, userid, file createdate, info, etc.)
> into a database table for sorting and grouping.

> For simplicify sake, I would rather not use an external database to
> accomplish this, but seems like using something like a dynamic array would
> consume too much memory / system resources -- although this approach might
> permit faster operations. Right?

> Any thoughts / ideas on the best approach to this application are greatly
> appreciated.

One solution would be to define a record or class that holds the required
information and build a tList of entries. You can then use the list's Sort
method to arrange the list in the desired order(s). Since only pointers are
being moved around this will be relatively fast.

70K entries, if I understand the post correctly, is significant but not
outrageous. You might want to work out how much memory this is going to
require. If its enough to force the workstation to start swapping
significant chunks to the vm store you might find it just as efficient to
use a db table - this really depends on the workstation's configuration and
loading.

Re:Newbie - Sorting Files


Thanks for the pointers. Sounds like there's much research to do, but I
suspect I'll need to use the linked list bucket sort mentioned since this is
not a fixed length array....
- Kevin

Re:Newbie - Sorting Files


Thanks for the reply, Bruce.  You're understanding is correct - there are
approx 70,000 files that need loading and sorting.

I'd really like utilize a dynamic array of records to avoid having to
distribute & maintain a physical database, but target workstation platform
and resources vary greatly per installation (i.e., sometimes may be Win98 PC
with 32MB RAM other times may be WinNT with 512MB RAM).

I have no real world experience implementing large arrays of records so I
don't have a clue as how to go about determining the memory requirements
short of building the app and then taking before, during and after memory
measurements. Does design standard or general rule of thumb exist that could
be used?

Re:Newbie - Sorting Files


Quote
"Kevin Hourihane" <khourih...@hotmail.com> wrote in message

news:9bi6pc$a9k$1@bob.news.rcn.net...

Quote
> Thanks for the reply, Bruce.  You're understanding is correct - there are
> approx 70,000 files that need loading and sorting.

> I'd really like utilize a dynamic array of records to avoid having to
> distribute & maintain a physical database, but target workstation platform
> and resources vary greatly per installation (i.e., sometimes may be Win98
PC
> with 32MB RAM other times may be WinNT with 512MB RAM).

> I have no real world experience implementing large arrays of records so I
> don't have a clue as how to go about determining the memory requirements
> short of building the app and then taking before, during and after memory
> measurements. Does design standard or general rule of thumb exist that
could
> be used?

As a rough rule of thumb use the size of the record * the number of entries.
For the record size simply sum the size of each field, using 4 bytes for
each string field. Then add the "average" expected length of each of the
string fields. (I usually then round it up to the nearest power of 2.) Its
rough but will give you a fairly good working number.

If you are going to use a dynamic array instead of a tList, give some
thought to sorting pointers to the entries rather than the entries
themselves. As fast as machines are these days moving all that data around
will still take a measurable amount of time.

Something else you might consider is using an in-memory table. Some versions
of Delphi support this as do many databases, including DBISAM, and there are
a number of 3rd party components that provide the functionality.

One thing that I do wonder about is how static is the data. If a significant
number of the files that have to be processed will be around across several
runs then you really should give serious thought to maintaining a
non-volatile table and simply update it on each run. This could
significantly improve the performance of your app.

Re:Newbie - Sorting Files


Thanks for the advice. Is a in-memory table simply a pre-configured version
of the array of records we've discussed? I'm sure there's a bit of value-add
built into these components so I'll plan to take a look...

For the most part, these transaction files are static (this is essentially
the primary catalyst behind this app -- ideally there would be no lingering
transaction files because the tranasction file owner (userid) would have
connected, copied, processed, and then deleted the associated file upon
disconnect. However, this often isn't the case and therefore we need a
mechanism to easily identify those users that cannot or are not using the
system properly).

I see your point about maintaining a non-volatile table and simply updating
it on each run in order to optimize performance.  This should greatly reduce
processing time following an initial load. Thanks!

Quote
Bruce Roberts wrote in message <6H1D6.20303$TW.92...@tor-nn1.netcom.ca>...

>"Kevin Hourihane" <khourih...@hotmail.com> wrote in message
>news:9bi6pc$a9k$1@bob.news.rcn.net...
>> Thanks for the reply, Bruce.  You're understanding is correct - there are
>> approx 70,000 files that need loading and sorting.

>> I'd really like utilize a dynamic array of records to avoid having to
>> distribute & maintain a physical database, but target workstation
platform
>> and resources vary greatly per installation (i.e., sometimes may be Win98
>PC
>> with 32MB RAM other times may be WinNT with 512MB RAM).

>> I have no real world experience implementing large arrays of records so I
>> don't have a clue as how to go about determining the memory requirements
>> short of building the app and then taking before, during and after memory
>> measurements. Does design standard or general rule of thumb exist that
>could
>> be used?

>As a rough rule of thumb use the size of the record * the number of
entries.
>For the record size simply sum the size of each field, using 4 bytes for
>each string field. Then add the "average" expected length of each of the
>string fields. (I usually then round it up to the nearest power of 2.) Its
>rough but will give you a fairly good working number.

>If you are going to use a dynamic array instead of a tList, give some
>thought to sorting pointers to the entries rather than the entries
>themselves. As fast as machines are these days moving all that data around
>will still take a measurable amount of time.

>Something else you might consider is using an in-memory table. Some
versions
>of Delphi support this as do many databases, including DBISAM, and there
are
>a number of 3rd party components that provide the functionality.

>One thing that I do wonder about is how static is the data. If a
significant
>number of the files that have to be processed will be around across several
>runs then you really should give serious thought to maintaining a
>non-volatile table and simply update it on each run. This could
>significantly improve the performance of your app.

Re:Newbie - Sorting Files


Quote
"Kevin Hourihane" <khourih...@hotmail.com> wrote in message

news:9biboh$3sr$1@bob.news.rcn.net...

Quote
> Thanks for the advice. Is a in-memory table simply a pre-configured
version
> of the array of records we've discussed? I'm sure there's a bit of
value-add
> built into these components so I'll plan to take a look...

An in-memory table is simply a tDataset descendant the is maintained
entirely in memory. It offers db-like access to the data.

Other Threads