Login

***Gintaras*** · 07-14-2009, 10:44 AM

HtmlDoc can get HTML or text of specified tag. To get what is inside, use string functions, eg findrx. Also can be used html element functions.

Macro Macro1090

Code: Copy      Help
str s=

;<body>

;<div class='text'>

;<b>Covers</b><br/>

;http://xxxyyyyzzzz.com/somefile.html <- i want that

;</div>

;</body>

HtmlDoc d.InitFromText(s)

;str s2=d.GetHtml("div" 0)

str s2=d.GetText("div" 0)

;out s2

str s3

if(findrx(s2 "\bhttp:\S+" 0 1 s3)<0) ret

out s3

Often you can easily extract required strings from whole page HTML using findrx. Use HtmlDoc only when it is too difficult. HtmlDoc uses IE HTML parsing engine to parse page HTML into smaller elements. Then you find required elements, and work with their text or HTML using string functions.

containerTag is HTML tag name, like div. To find first div, use d.GetText("div" 0), to find next div, use d.GetText("div" 1), and so on.

HtmlDoc.d and d3 are variables of type IHTMLDocument2 and IHTMLDocument3. Both can be used to access MSHTML DOM. Documented in MSDN library.

Login
Username:
Password:	Lost Password?
	Remember me