Board index » delphi » Indy - Getting default page e.g index.htm, default.htm

Indy - Getting default page e.g index.htm, default.htm

I have made a simple proxy using Indy and would like to be able to cache the
files that I have visited to do this I was going to create a folder for
every site e.g

c:\cache\www.borland.com

then store the page within that folder

c:\cache\www.borland.com\index.htm
c:\cache\www.borland.com\images\image1.jpg
etc

But how do I get the default page name as it is not always index.htm!

In my IE cache folder it retrieves the name so do programs like WebWhacker!

Thanks

Ian Groves

P.S Have tried Deja but with no success!

 

Re:Indy - Getting default page e.g index.htm, default.htm


You shouldn't use the page's name to do your cacheing but the page's URL.
The best way to implement that is some sort of hash table scheme.

In short, you store the file data and the file's original URL (the full URL
including parameters) together either in the same database record (if
you're using a database) or in two parts of the same file (you can use a
fixed offset within your file to store the file's data, that will allow for
speedy data retreival) and use the hash value as a file name.

That way, when you get a hit, you just hash the URL, see if a file exists
with that name and check wither the URL stored in it is the same as the one
you are looking for.

The only problem is solving collisions. A simple way is to send the hash
value through the hash function again and store the file under that value,
until you find a slot. The only trick to watch for, in that case, is that
when you delete a file, you must first relink all "multiple hash" files
with the same value. To do that, compute the hash's hash value (h(h(URL))
and see if there is a file there. If there is, check if that file's hash is
a hash of the URL or or the hash of the file you're deleting. (Am I making
any sens here ??)

Well, anyway, that's the idea: store the URLs and use a hash table to
identify the file themselves.

Good luck,
Stephane

On 21 Jan 2002, "Ian Groves" <i...@REMOVEigroves.freeserve.co.uk> wrote in
news:3c4c973b_1@dnews:

Quote
> I have made a simple proxy using Indy and would like to be able to
> cache the files that I have visited to do this I was going to create a
> folder for every site e.g

> c:\cache\www.borland.com

> then store the page within that folder

> c:\cache\www.borland.com\index.htm
> c:\cache\www.borland.com\images\image1.jpg
> etc

> But how do I get the default page name as it is not always index.htm!

> In my IE cache folder it retrieves the name so do programs like
> WebWhacker!

> Thanks

> Ian Groves

> P.S Have tried Deja but with no success!

Other Threads