Navigating MSHTML from C# without a WebBrowser control

 http://www.experts-exchange.com/Web_Development/Components/ActiveX/Q_24168307.html

http://radio.javaranch.com/balajidl/2006/01/18/1137606354980.html

http://msdn.microsoft.com/en-us/library/Aa290341

 

 方法一:

VB.NET的语法,可以改成C#,里面的createDocumentFromUrl略有误,改为IHTMLDocument4即可

可惜不能Create from string

 

Make sure you add reference to Microsoft.mshtml from the .NET objects collection and "Imports System.Runtime.InteropServices"

 

 

'We will use HTMLDocument to open and load remote webpage in to IHTMLDocument2
'we can't use the same HTMLDocument as it is needed for persistance(IPersistStream)
'we also can't use IHTMLDocument2 object as it will not have DOM interface faetures enabled. we will use IHTMLDocument3. 
Dim url as String = http://java.sun.com 
Dim objMSHTML As New mshtml.HTMLDocument
Dim objMSHTML2 As mshtml.IHTMLDocument2
Dim objMSHTML3 As mshtml.IHTMLDocument3
Dim x As Integer = 10 'a dummy variable

Dim objIPS As IPersistStreamInit 'here is the whole trick
objIPS = DirectCast(objMSHTML, IPersistStreamInit)
objIPS.InitNew() 'you have to do it, if not you will always have readyState as "loading"
objMSHTML2 = objMSHTML.createDocumentFromUrl(url, vbNullString)
Do Until objMSHTML2.readyState = "complete"
  x = x + 1
  Application.DoEvents 'Suggested by John
Loop
objMSHTML3 = DirectCast(objMSHTML2, mshtml.IHTMLDocument3)






 

Now you can start using DOM interfaces like getElementByID(), getElementsByTagName(..) etc.,
 
 
 
 
 -------------------------------------------------------------------------------------------------------------------------------------------------------------
 方法二:
 原来IHTMLDOCUMENT2是可以直接写入string的
  
 
 
 
//Another alternative would be to use the builtin engine mshtml:

using mshtml; 
... 
object[] oPageText = { html }; 
HTMLDocument doc = new HTMLDocumentClass(); 
IHTMLDocument2 doc2 = (IHTMLDocument2)doc; 
doc2.write(oPageText); 

//This allows you to use javascript-like functions like getElementById()

 
 -------------------------------------------------------------------------------------------------------------------------------------
方法3:
这个fizzler分析器,是基于HTMLAgilityPack的HTML代码分析库,用法与Jquery相同,使用起来相当方便.
相比早前使用的Winista.HtmlParser,优胜了很多,推荐使用.
 
 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

 
// Load the document using HTMLAgilityPack as normal 
var html = new HtmlDocument(); 
html.LoadHtml(@"   <html>       <head></head>       <body>         <div>           <p class='content'>Fizzler</p>           <p>CSS Selector Engine</p></div>       </body>   </html>");  
// Fizzler for HtmlAgilityPack is implemented as the  
// QuerySelectorAll extension method on HtmlNode  

var document = htmlDocument.DocumentNode;  

// yields: [<p class="content">Fizzler</p>] 
document.QuerySelectorAll(".content");   

// yields: [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>] document.QuerySelectorAll("p");  

// yields empty sequence 
document.QuerySelectorAll("body>p"); 
 
// yields [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>] document.QuerySelectorAll("body p");  

// yields [<p class="content">Fizzler</p>] 
document.QuerySelectorAll("p:first-child");

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值