http://www.experts-exchange.com/Web_Development/Components/ActiveX/Q_24168307.html
http://radio.javaranch.com/balajidl/2006/01/18/1137606354980.html
http://msdn.microsoft.com/en-us/library/Aa290341
方法一:
VB.NET的语法,可以改成C#,里面的createDocumentFromUrl略有误,改为IHTMLDocument4即可
可惜不能Create from string
Make sure you add reference to Microsoft.mshtml from the .NET objects collection and "Imports System.Runtime.InteropServices"
'We will use HTMLDocument to open and load remote webpage in to IHTMLDocument2
'we can't use the same HTMLDocument as it is needed for persistance(IPersistStream)
'we also can't use IHTMLDocument2 object as it will not have DOM interface faetures enabled. we will use IHTMLDocument3.
Dim url as String = http://java.sun.com
Dim objMSHTML As New mshtml.HTMLDocument
Dim objMSHTML2 As mshtml.IHTMLDocument2
Dim objMSHTML3 As mshtml.IHTMLDocument3
Dim x As Integer = 10 'a dummy variable
Dim objIPS As IPersistStreamInit 'here is the whole trick
objIPS = DirectCast(objMSHTML, IPersistStreamInit)
objIPS.InitNew() 'you have to do it, if not you will always have readyState as "loading"
objMSHTML2 = objMSHTML.createDocumentFromUrl(url, vbNullString)
Do Until objMSHTML2.readyState = "complete"
x = x + 1
Application.DoEvents 'Suggested by John
Loop
objMSHTML3 = DirectCast(objMSHTML2, mshtml.IHTMLDocument3)
Now you can start using DOM interfaces like getElementByID(), getElementsByTagName(..) etc.,
-------------------------------------------------------------------------------------------------------------------------------------------------------------
方法二:
原来IHTMLDOCUMENT2是可以直接写入string的
//Another alternative would be to use the builtin engine mshtml:
using mshtml;
...
object[] oPageText = { html };
HTMLDocument doc = new HTMLDocumentClass();
IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
doc2.write(oPageText);
//This allows you to use javascript-like functions like getElementById()
-------------------------------------------------------------------------------------------------------------------------------------
方法3:
这个fizzler分析器,是基于HTMLAgilityPack的HTML代码分析库,用法与Jquery相同,使用起来相当方便.
相比早前使用的Winista.HtmlParser,优胜了很多,推荐使用.
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
// Load the document using HTMLAgilityPack as normal
var html = new HtmlDocument();
html.LoadHtml(@" <html> <head></head> <body> <div> <p class='content'>Fizzler</p> <p>CSS Selector Engine</p></div> </body> </html>");
// Fizzler for HtmlAgilityPack is implemented as the
// QuerySelectorAll extension method on HtmlNode
var document = htmlDocument.DocumentNode;
// yields: [<p class="content">Fizzler</p>]
document.QuerySelectorAll(".content");
// yields: [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>] document.QuerySelectorAll("p");
// yields empty sequence
document.QuerySelectorAll("body>p");
// yields [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>] document.QuerySelectorAll("body p");
// yields [<p class="content">Fizzler</p>]
document.QuerySelectorAll("p:first-child");