Board index » delphi » IE "View Source" with Indy

IE "View Source" with Indy

I am building an application that captures the source code of a web page. I have previously used TWebBrowser, but it is too resource-intensive, so I am trying to switch to Indy.

My problem is that, given a dynamic page that contains javascript functions that build content during the download, TWebBrowser can interrogate the corresponding javascript file on the fly and build up the full client version of the page source (which is what I want) before indicating that it is done.
Indy (THTTP.Get) just gives me the raw code without calling the function - is there any way to get it to act more like IE in this respect?

Thanks

 

Re:IE "View Source" with Indy


Well, there is, but it's quite a bit of work.

First, you have to get windows scripting host. This scripting tool will
allow you to run scripts (in several languages, actually: VBScript, JScript
and perl are standard, orthers can be added) through an automation
interface.

Now, once you've got and installed it (and played a bit with it), you ahve
to do the following:

Write a COM object that exposes the same methodes/properties as the
JavaScript objects you want to emulate (mostly, the document container).

Have the implementation modify the source code when the corresponding
method is added by JavaScript.

Add the object to the VSH environement (there is a method for that, I don't
recall it right now but it's pretty easy to find it from the help).

Read the result from your object.

Good luck,
Stephane

On 18 Jun 2001, nick.tul...@natwestnospam.com (Nick Tulett) wrote in
<3b2dd255$2_1@dnews>:

Quote

>I am building an application that captures the source code of a web
>page. I have previously used TWebBrowser, but it is too
>resource-intensive, so I am trying to switch to Indy.

>My problem is that, given a dynamic page that contains javascript
>functions that build content during the download, TWebBrowser can
>interrogate the corresponding javascript file on the fly and build up
>the full client version of the page source (which is what I want) before
>indicating that it is done. Indy (THTTP.Get) just gives me the raw code
>without calling the function - is there any way to get it to act more
>like IE in this respect?

>Thanks

Re:IE "View Source" with Indy


Hey Nick
I am right now trying to use TWebBrowser to download web pages.  I
wrote a program using Indy but the site that I want to download from
uses cookies and I am having trouble with them right now.  I am sorry
that I can not help you but maybe you can help me.  Is there a way
using TWebBrowser to save the page.  I am not really interested in
displaying it but it can as it downloads, I don't care.
Thanks
M5

PS Good luck with your question.

Re:IE "View Source" with Indy


Thanks for your swift reply, Stephane.

I won't pretend to understand everything you have said,
but I'm guessing I wouldn't be able to do all this in the
next week, so I'll have to stick with the IE + 500MB page file
solution.

Quote
grob...@fulgan.com (Stephane Grobety) wrote:
>Well, there is, but it's quite a bit of work.

>First, you have to get windows scripting host. This scripting tool will
>allow you to run scripts (in several languages, actually: VBScript, JScript
>and perl are standard, orthers can be added) through an automation
>interface.

>Now, once you've got and installed it (and played a bit with it), you ahve
>to do the following:

>Write a COM object that exposes the same methodes/properties as the
>JavaScript objects you want to emulate (mostly, the document container).

>Have the implementation modify the source code when the corresponding
>method is added by JavaScript.

>Add the object to the VSH environement (there is a method for that, I don't
>recall it right now but it's pretty easy to find it from the help).

>Read the result from your object.

>Good luck,
>Stephane

Re:IE "View Source" with Indy


Quote
M5 <M...@nowhere.com> wrote:
>Hey Nick
>I am right now trying to use TWebBrowser to download web pages.  I
>wrote a program using Indy but the site that I want to download from
>uses cookies and I am having trouble with them right now.  I am sorry
>that I can not help you but maybe you can help me.  Is there a way
>using TWebBrowser to save the page.  I am not really interested in
>displaying it but it can as it downloads, I don't care.
>Thanks
>M5

>PS Good luck with your question.

I'd like to post a full reply, but I'm at work at the minute. Your best bet would be to take a look in the oleautomation newsgroup, there's plenty on TWebBrowser in there.

Re:IE "View Source" with Indy


Quote
M5 <M...@nowhere.com> wrote:
>Hey Nick
>I am right now trying to use TWebBrowser to download web pages.  I
>wrote a program using Indy but the site that I want to download from
>uses cookies and I am having trouble with them right now.  I am sorry
>that I can not help you but maybe you can help me.  Is there a way
>using TWebBrowser to save the page.  I am not really interested in
>displaying it but it can as it downloads, I don't care.
>Thanks
>M5

>PS Good luck with your question.

I'd like to post a full reply, but I'm at work at the minute. Your best bet would be to take a look in the oleautomation newsgroup, there's plenty on TWebBrowser in there.

Re:IE "View Source" with Indy


Quote
>Thanks for your swift reply, Stephane.

NP.

Quote
>I won't pretend to understand everything you have said,
>but I'm guessing I wouldn't be able to do all this in the
>next week, so I'll have to stick with the IE + 500MB page file
>solution.

I think you guessed right. Basically, it means interpreting the code and
reinventing the web browser interface to do so.

There isn't really any sollution for that... Except maybe using a site that
is not as badly written as this one (flaming the web master ??)

Good luck,
Stephane

Other Threads