Board index » delphi » Looking for an algorithm...

Looking for an algorithm...

This is asking for a C++ solution in a Pascal sub...off topic
definitely...but since I'm a nice guy...

Quote
ro...@ix.netcom.com wrote:
> Here is the scenario:
>    A huge amount of data is being transferred  (in the form of string
> records). This data must be recorded in special files, with the
> condition that all records must be unique  (the records transferred
> are not unique records).
>   What I am looking for, is a way to check that a given record is
> unique before recording (this should happen run time). One bad
> solution, is before recording the string, go through the file
> comparing it to all the others. This is unacceptable, because it would
> take too long.
>[even more unpurposeful checksum solution snipped]

Why not, when dealing with data, try using a data structures solution,
instead of relying on something like checksums which may or may not be
accurate?  Why not try a binary sorted  tree (BST) approach?  It does
what you were wanting to do in the second paragraph, but amazingly, it
doesn't go through comparing all the others....AMAZING!!!

For your benefit and mine (advertising! <grin>) here's part 19 of the
Pascal tutorial in the URL below.
====================================================
                        Turbo Pascal for DOS Tutorial
                            by Glenn Grotzinger
                           Part 19: Binary Trees
                 copyright (c) 1995-96 by Glenn Grotzinger

OK.  I stated no real problem that needed solving earlier, so we will
start getting into some new material.  This is an extension to part
18, in a small way, since it involves pointered structures similar to
the linked lists, but the procedures for handling them are much
different, so I am covering them in a seperate section.

What is a Binary Tree?
======================
A binary tree is a set of nodes with 2 possible paths for each node,
left or right.  Hence, a binary tree node would look something like
this:

type
  nodeptr = ^node;
  node = record
    ourinfo: integer;
    left, right: nodeptr;                                            
  end;

Conceptually (my best low ASCII rendering, trust me!), a binary tree
would look something like this.  We can see from below that the
possible paths of those two directions are any numerous.  As with the
linked lists, nil ends the path (knowing, it is probably garbled,
but....).

                                NODE
                   _________/  \_________
                  /                               \
               NODE                           NODE
           ____/  \____                  ____/  \____
          /                  \                /                  \
       NODE            nil             nil             NODE
   ____/  \____                                    ____/  \____
  /                  \                                  /            \
nil           nil                                 NODE           nil
                                               ____/  \____
                                              /                  \
                                          NODE            nil
                                    ____/  \____
                                   /                  \
                                  nil                nil

Rules of a Binary Sorted Tree
======================
Now that we see what the basic idea of a binary tree is, we will now
study the rules for it.  The basic rules of a binary tree is the
following for any node in a binary tree commonly used for the purpose
that they are used for a lot of times -- as a binary search tree or
BST.  Any program code in this section will make use of BSTs...:

1) values smaller than the value currently stored in a questioned
   node goes to the left.
2) values larger than the value currently stored in a questioned
   node goes the right.
3) values which are equal to other values in the tree are not placed
   into the tree.

Let us see this by manually examining how we might build a small
binary tree.

Example of a Binary Tree Manual Build
=====================================
Let us examine how we would build a binary tree for a set of values
and how a program would traverse the tree with a very short example.

1) Let us start examining a tree build with a value set data of 27,
54, 32, 18, and 5.
   a) 27 is naturally the first or root node of the tree.  There is no
      issue there.
   b) 54 > 27 so 54 would go to the right.  Our resultant tree looks
now like this:
                                 27
                                   \
                                    54

   c) 32 > 27.  So we would move to the right.  Now we see 54 as the
main node  32 < 54, so we would go to the left.  So the resultant tree
      after this addition looks like this:

                                 27
                                   \
                                    54
                                   /
                                 32

   d) 18 < 27.  So we would move to the left.  So our tree looks like
this:

                                 27
                                /  \
                              18    54
                                   /
                                 32

   e) 5 < 27.  So we would move to the left.  Now we see 18 as the
current node.  5 < 18.  Then we would move to the left again.  So the
final binary tree for those 5 values looks like this:

                                 27
                                /  \
                              18    54
                             /     /
                           5     32

This is a reasonable binary tree structure.  Any paths not used are
set to nil in the process.  We should see now how binary trees are
constructed.

2) Now lets study traversal of the tree.  It would depend on if we
visit the actual node first or last.  (I will give a detailed
explanation of that last statement later.)  Normally, as a standard,
it's good to use the left side first.  Therefore, lets look at the
tree built in #1.

The order the numbers would come up in is 5-18-27-32-54 on a basis
that we handle or visit the head node of the tree last in moving
through the binary tree. If you remember the discussion of the array
binary search, we had to sort the data to accomplish it.  Here it is
taken care of already for us.

Basic Idea of Programming for Binary Trees
==========================================
You may have figured out already from our preliminary discussions,
that recursion is a _major_ staple of binary tree programming (read
ALWAYS use it!).  Note that a binary tree can be divided up into
smaller and smaller trees, hence our ability to use recursion.  Any
functions involved almost always ends up using some order of the
following pseudocode, assuming a name of PROCEDURE for the procedure:

PERFORM PROCEDURE ON LEFT SIDE OF TREE
PERFORM ACTION ON ROOT NODE (the top node of the tree)
PERFORM PROCEDURE ON RIGHT SIDE OF TREE

We will see this idea be paramount in all the code we use.  If you
have noticed in your previous work about recursion, there is always a
control statement present.  Here, it will always be searching for the
nil nodes. *ALWAYS* remember to replace the nil nodes if they come
up....binary search trees are the perferred method generally, for
setting up searches.

program binary_tree_example;

  type
    nodeptr = ^node;
    node = record
      somedata: integer;
      left, right: nodeptr;
    end;

  var
    tree: nodeptr;
    afile: text;
    i: integer;

  procedure deletetree(var tree: nodeptr);
     { This procedure is a function designed to remove a binary tree
       from memory }

    begin
      { Given the background information, and other procedures as
examples, you should be able to come up with this one }
    end;

  procedure inserttree(anumber: integer; var tree: nodeptr);
     { This procedure is a function designed to create a binary tree
       in memory by inserting a node in the proper position in the
tree }

    begin
      if tree = nil then
        begin
          new(tree);
          tree^.somedata := anumber;
          tree^.left := nil;
          tree^.right := nil;
        end
      else
        begin
          if anumber > tree^.somedata then
            inserttree(anumber, tree^.right);
          if anumber < tree^.somedata then
            inserttree(anumber, tree^.left);
        end;
    end;

  procedure searchthrutree(anumber: integer; tree: nodeptr);
    { This procedure is designed to search through a binary tree for a
      particular value given in the variable anumber. }
    begin
      { Given the background information and other procedures as
examples, you should be able to come up with this one. }
    end;

  procedure writeouttree(tree: nodeptr);
    { This procedure outputs a binary tree. }
    begin
      if tree <> nil then
        begin
          writeouttree(tree^.left);
          write(afile, tree^.somedata, ',');
          writeouttree(tree^.right);
        end;
    end;

  begin
    randomize;
    tree := nil;
    { the last statement is required for the nil markers set up via
the  BST procedures listed above }

    assign(afile, 'BTEST.TXT');
    rewrite(afile);
    writeln(memavail, ' bytes available before tree creation.');
    for i := 1 to 10 do
      inserttree(random(5)+1, tree);  {This creates the tree. }
    writeln(memavail, ' bytes available after tree created.');
    for i := 1 to 10 do
      searchthrutree(random(5)+1, tree);
    writeln(afile);
    writeln(afile);
    writeln(afile, 'I will now output the binary tree for the
example.');
    writeouttree(tree);
    writeln;
    close(afile);
    writeln('Now deleting tree out of memory');
    deletetree(tree);
    writeln(memavail, ' bytes available after tree deletion.');
  end.

This is a whole program for working with a binary tree.  I purposely
left out 2 of the procedures, because the background information I
gave earlier, plus pointer logic ...

read more »

 

Re:Looking for an algorithm...


Here is the scenario:
   A huge amount of data is being transferred  (in the form of string
records). This data must be recorded in special files, with the
condition that all records must be unique  (the records transferred
are not unique records).
  What I am looking for, is a way to check that a given record is
unique before recording (this should happen run time). One bad
solution, is before recording the string, go through the file
comparing it to all the others. This is unacceptable, because it would
take too long.
  A better solution is as follows:
    Allocate a large chunk of memory - let's call it STORAGE. (We will
use this STORAGE as a set of bits.) As soon as a new record is
transferred, by some function a number (KEY) will be generated for
this string. This KEY is going to be the bit number in our STORAGE. We
will go to that bit in STORAGE and see if it is 0. If so, that means
that our string is unique (and we can easily record it). If the bit is
1, that would mean that this string was transferred before, or another
string generated the same KEY.

What I am looking for is this function that would take in a string
(about 20 characters) and generate a number between 0 and 4 mill.
Obviously, with the hope that it would be very unlikely for two
different strings to produce the same KEY.

Any suggestions?

Thank you very much in advance.

P.S. this is for a C++ program.

Re:Looking for an algorithm...


Do you want to have a big income? If so, you have your big chance now,
by selling quality low cost mouse pad. Choose between those colours
red, blue, black, gray or green. There are discount if you buy more
than 49, 99, 499, 999 or 2,499 pieces. Get some extra money in your
free time or at work, coz all over they need those quality mouse pads,
so get started to earn easy money. Its your choise!!!!
Mail for more information to: Kim_Stein...@online.pol.dk
Wellcome to a business where the money is.

Kim

Re:Looking for an algorithm...


Quote
ro...@ix.netcom.com wrote:

> Here is the scenario:
>    A huge amount of data is being transferred  (in the form of string
> records). This data must be recorded in special files, with the
> condition that all records must be unique  (the records transferred
> are not unique records).
>   What I am looking for, is a way to check that a given record is
> unique before recording (this should happen run time). One bad
> solution, is before recording the string, go through the file
> comparing it to all the others. This is unacceptable, because it would
> take too long.
>   A better solution is as follows:
>     Allocate a large chunk of memory - let's call it STORAGE. (We will
> use this STORAGE as a set of bits.) As soon as a new record is
> transferred, by some function a number (KEY) will be generated for
> this string. This KEY is going to be the bit number in our STORAGE. We
> will go to that bit in STORAGE and see if it is 0. If so, that means
> that our string is unique (and we can easily record it). If the bit is
> 1, that would mean that this string was transferred before, or another
> string generated the same KEY.

> What I am looking for is this function that would take in a string
> (about 20 characters) and generate a number between 0 and 4 mill.
> Obviously, with the hope that it would be very unlikely for two
> different strings to produce the same KEY.

> Any suggestions?

> Thank you very much in advance.

> P.S. this is for a C++ program.

You might want to try using a CRC-32 as a "hashing" function.  
I have some source code in C++ or Delphi if you'd like.

--
Earl F. Glynn          EarlGl...@WorldNet.att.net
EFG Software              913/859-9557  Voice/Fax
   Scientific/Engineering/Medical Applications
             Overland Park, KS  USA

Re:Looking for an algorithm...


This is spam get the {*word*30} out of this news group!  People like you eat
bandwidth! Go with all your other {*word*30}ing friends in
alt.spam.spam.spam or maybe in alt.get.a.{*word*30}ing.life

Quote
>Do you want to have a big income? If so, you have your big chance now,
>by selling quality low cost mouse pad. Choose between those colours
>red, blue, black, gray or green. There are discount if you buy more
>than 49, 99, 499, 999 or 2,499 pieces. Get some extra money in your
>free time or at work, coz all over they need those quality mouse pads,
>so get started to earn easy money. Its your choise!!!!
>Mail for more information to: Kim_Stein...@online.pol.dk
>Wellcome to a business where the money is.
>Kim

********************************************************************************
                              Brin's Humor List
SUBSCRIBE!: If you're not on then get signed up!                    
     >>>>>Send an EMail to:  b...@star.net<<<<<                  
     leaving the body with lots of funny comments and
     where you heard about us and Subscribe in the Subject.          
 DISTRIBUTE!: Send this to all your friends, enemies, and        
     to someone very special who you havn't seen in a long time
COME AND SEE:The archives                                              
     http://www.star.net/people/~brin                                        
CONTRIBUTE!: Send me funny stuff.                                    
DISCLAIMER: Please read the disclaimer at:
  http://www.star.net/people/~Brin/Humor/disclaim.htm
**********************************************************************************

Re:Looking for an algorithm...


In article <32dedf7a.357...@news.internetland.net> of Fri, 17 Jan 1997
02:26:42 in comp.lang.pascal.misc, Programmer Dude

Quote
<ggr...@internetland.net> wrote:
>ro...@ix.netcom.com wrote:

>> Here is the scenario:
>>    A huge amount of data is being transferred  (in the form of string
>> records). This data must be recorded in special files, with the
>> condition that all records must be unique  (the records transferred
>> are not unique records).
>>   What I am looking for, is a way to check that a given record is
>> unique before recording (this should happen run time). One bad
>> solution, is before recording the string, go through the file
>> comparing it to all the others. This is unacceptable, because it would
>> take too long.
>>[even more unpurposeful checksum solution snipped]

>Why not, when dealing with data, try using a data structures solution,
>instead of relying on something like checksums which may or may not be
>accurate?  Why not try a binary sorted  tree (BST) approach?  

A plain binary tree will fail on depth-of-recursion if the data is huge
(MinHuge in TP/BP must be <~16K) and the data is already more-or-less
sorted.  Not insuperable; but beware.

--
John Stockton, Surrey, UK.  j...@merlyn.demon.co.uk  Turnpike v1.12  MIME
    http://www.merlyn.demon.co.uk/
    My news-service had variable input backlog, 0-48 hours.

Other Threads