在webForm中WebRequest\WebClient\WebBrowser获取远程页面源码的三种方式(downmoon)

  一个小需求,获取远程页面的源码,主要用于抓数据。原来用的好好的,最近突然不能获取页面源码了,但是仍然可以用浏览器正常浏览。(文后附源码下载。^_^)

  经过分析,原来用的代码如下:

ContractedBlock.gif ExpandedBlockStart.gif Code
None.gifStreamReader sreader = null;
None.gif            
string result = string.Empty;
None.gif            
try
ExpandedBlockStart.gifContractedBlock.gif            
dot.gif{
InBlock.gif                HttpWebRequest httpWebRequest 
= (HttpWebRequest)WebRequest.Create(Url);
InBlock.gif                
//httpWebRequest.Timeout = 20;
InBlock.gif
                httpWebRequest.KeepAlive = false;
InBlock.gif                
#endregion
InBlock.gif                HttpWebResponse httpWebResponse 
= (HttpWebResponse)httpWebRequest.GetResponse();
InBlock.gif                
if (httpWebResponse.StatusCode == HttpStatusCode.OK)
ExpandedSubBlockStart.gifContractedSubBlock.gif                
dot.gif{
InBlock.gif                    sreader 
= new StreamReader(httpWebResponse.GetResponseStream(), encoding);
InBlock.gif                    result 
= reader.ReadToEnd();
ExpandedSubBlockStart.gifContractedSubBlock.gif                   
if (null != httpWebResponse) dot.gif{ httpWebResponse.Close(); }
InBlock.gif                
return result; 
ExpandedSubBlockEnd.gif                }

InBlock.gif                
return result; ;
ExpandedBlockEnd.gif            }

ExpandedBlockStart.gifContractedBlock.gif            
catch (WebException e)  dot.gif{   return null;  }
ExpandedBlockStart.gifContractedBlock.gif            
finally dot.gifif (sreader != nulldot.gif{ sreader.Close(); } }
None.gif
None.gif

 查了下资料,原来需要加参数。
       #region 关键参数,否则会取不到内容 Important Parameters,else get nothing.
                httpWebRequest.UserAgent = "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
                httpWebRequest.Accept = "*/*";
                httpWebRequest.KeepAlive = true;
                httpWebRequest.Headers.Add("Accept-Language", "zh-cn,en-us;q=0.5");

                #endregion

修正后的代码如下:

ContractedBlock.gif ExpandedBlockStart.gif Code
ContractedBlock.gifExpandedBlockStart.gif 读取页面详细信息#region 读取页面详细信息
ExpandedSubBlockStart.gifContractedSubBlock.gif        
/**//// <summary>  /// 读取页面详细信息 
InBlock.gif        
/// </summary>  
InBlock.gif        
///<param name="Url">需要读取的地址</param> 
InBlock.gif        
/// <param name="encoding">读取的编码方式</param>  
ExpandedSubBlockEnd.gif        
/// <returns></returns>   

InBlock.gif        public static string GetStringByUrl(string Url, System.Text.Encoding encoding)
ExpandedSubBlockStart.gifContractedSubBlock.gif        
dot.gif{
InBlock.gif            
if (Url.Equals("about:blank")) return null; ;
ExpandedSubBlockStart.gifContractedSubBlock.gif            
if (!Url.StartsWith("http://"&& !Url.StartsWith("https://")) dot.gif{ Url = "http://" + Url; }
InBlock.gif            
int dialCount = 0;
InBlock.gif        loop:
InBlock.gif            StreamReader sreader 
= null;
InBlock.gif            
string result = string.Empty;
InBlock.gif            
try
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
InBlock.gif                HttpWebRequest httpWebRequest 
= (HttpWebRequest)WebRequest.Create(Url);
InBlock.gif                
//httpWebRequest.Timeout = 20;
ContractedSubBlock.gifExpandedSubBlockStart.gif
                关键参数,否则会取不到内容 Important Parameters,else get nothing.#region 关键参数,否则会取不到内容 Important Parameters,else get nothing.
InBlock.gif                httpWebRequest.UserAgent 
= "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
InBlock.gif                httpWebRequest.Accept 
= "*/*";
InBlock.gif                httpWebRequest.KeepAlive 
= true;
InBlock.gif                httpWebRequest.Headers.Add(
"Accept-Language""zh-cn,en-us;q=0.5");
ExpandedSubBlockEnd.gif                
#endregion

InBlock.gif                HttpWebResponse httpWebResponse 
= (HttpWebResponse)httpWebRequest.GetResponse();
InBlock.gif                
if (httpWebResponse.StatusCode == HttpStatusCode.OK)
ExpandedSubBlockStart.gifContractedSubBlock.gif                
dot.gif{
InBlock.gif                    sreader 
= new StreamReader(httpWebResponse.GetResponseStream(), encoding);
InBlock.gif                    
char[] cCont = new char[256];
InBlock.gif                    
int count = sreader.Read(cCont, 0256);
InBlock.gif                    
while (count > 0)
ExpandedSubBlockStart.gifContractedSubBlock.gif                    
dot.gif// Dumps the 256 characters on a string and displays the string to the console. 
InBlock.gif
                        String str = new String(cCont, 0, count);
InBlock.gif                        result 
+= str;
InBlock.gif                        count 
= sreader.Read(cCont, 0256);
ExpandedSubBlockEnd.gif                    }

ExpandedSubBlockEnd.gif                }

ExpandedSubBlockStart.gifContractedSubBlock.gif                
if (null != httpWebResponse) dot.gif{ httpWebResponse.Close(); }
InBlock.gif                
return result; 
ExpandedSubBlockEnd.gif            }

InBlock.gif            
catch (WebException e)
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
ExpandedSubBlockStart.gifContractedSubBlock.gif                
if (e.Status == WebExceptionStatus.ConnectFailure) dot.gif{ dialCount++; ReDial(); }
ExpandedSubBlockStart.gifContractedSubBlock.gif                
if (dialCount < 5dot.gifgoto loop; }
InBlock.gif                
return null;
ExpandedSubBlockEnd.gif            }

ExpandedSubBlockStart.gifContractedSubBlock.gif            
finally dot.gifif (sreader != nulldot.gif{ sreader.Close(); } }
ExpandedSubBlockEnd.gif        }

ExpandedBlockEnd.gif        
#endregion

None.gif
None.gif        
public static void ReDial()
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            
int res = 1;
ExpandedSubBlockStart.gifContractedSubBlock.gif            
/**/////while (res != 0)
InBlock.gif            
////{
InBlock.gif            
////    CSDNWebTest.RASDisplay ras = new RASDisplay();
InBlock.gif            
////    ras.Disconnect();
InBlock.gif            
////    res = ras.Connect("asdl");
InBlock.gif            
////    System.Threading.Thread.Sleep(TimeSpan.FromSeconds(10));
ExpandedSubBlockEnd.gif            
////} 

ExpandedBlockEnd.gif        }

None.gif

 问题是解决了,后来再想了想,可以用WebClient先把页面download到本地临时文件,再读取文本内容。

代码如下:

ContractedBlock.gif ExpandedBlockStart.gif Code
None.gif private string GetPageByWebClient(string url)
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            
string result = null;
InBlock.gif            
if (url.Equals("about:blank")) return null; ;
ExpandedSubBlockStart.gifContractedSubBlock.gif            
if (!url.StartsWith("http://"&& !url.StartsWith("https://")) dot.gif{ url = "http://" + url; }
InBlock.gif            
string filename = RandomKey(11119999+ ".txt";
InBlock.gif            DownloadOneFileByURLWithWebClient(filename, url, 
"C:\\");
InBlock.gif            StreamReader sr 
= new StreamReader("c:\\" + filename, System.Text.Encoding.Default);
ExpandedSubBlockStart.gifContractedSubBlock.gif            
try dot.gif{ result = sr.ReadToEnd(); return result; }
ExpandedSubBlockStart.gifContractedSubBlock.gif            
catch dot.gifreturn null; }
InBlock.gif            
finally
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
ExpandedSubBlockStart.gifContractedSubBlock.gif                
if (sr != nulldot.gif{ sr.Close(); }
ExpandedSubBlockEnd.gif            }

ExpandedBlockEnd.gif        }

None.gif        
private string RandomKey(int b, int e)
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            
return DateTime.Now.ToString("yyyyMMdd-HHmmss-fff-"+ this.getRandomID(b, e);
ExpandedBlockEnd.gif        }

None.gif        
private int getRandomID(int minValue, int maxValue)
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            Random ri 
= new Random(unchecked((int)DateTime.Now.Ticks));
InBlock.gif            
int k = ri.Next(minValue, maxValue);
InBlock.gif            
return k;
ExpandedBlockEnd.gif        }

None.gif        
private string GuidString
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
ExpandedSubBlockStart.gifContractedSubBlock.gif            
get dot.gifreturn Guid.NewGuid().ToString(); }
ExpandedBlockEnd.gif        }

None.gif
ExpandedBlockStart.gifContractedBlock.gif    
/**////Web Client Method ,only For Small picture
InBlock.gif        
/// </summary>
InBlock.gif        
/// <param name="fileName"></param>
InBlock.gif        
/// <param name="url"></param>
ExpandedBlockEnd.gif        
/// <param name="localPath"></param> 

None.gif        public static void DownloadOneFileByURLWithWebClient(string fileName, string url, string localPath)
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            System.Net.WebClient wc 
= new System.Net.WebClient();
ExpandedSubBlockStart.gifContractedSubBlock.gif            
if (File.Exists(localPath + fileName)) dot.gif{ File.Delete(localPath + fileName); }
ExpandedSubBlockStart.gifContractedSubBlock.gif            
if (Directory.Exists(localPath) == falsedot.gif{ Directory.CreateDirectory(localPath); }
InBlock.gif            wc.DownloadFile(url 
+ fileName, localPath + fileName);
ExpandedBlockEnd.gif        }


结果不能获取源码。错误如下:

 WebBrowser3.png

 再想想,还有Webbrowser控件可以用啊。在WinFrom下只要在主线程前加[STAThread]即可。
 

ContractedBlock.gif ExpandedBlockStart.gif Code
None.gif [STAThread]
None.gif        
public void GetURLContentByWebBrowser()
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            
try
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
InBlock.gif                
//webBrowser1 = new WebBrowser();
InBlock.gif
                string url = txtUrl.Text.Trim();
InBlock.gif                
string result = null;
InBlock.gif                WebBrowser wb 
= new WebBrowser();
ExpandedSubBlockStart.gifContractedSubBlock.gif                
/**/////if (wb != null){ wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted); } 
InBlock.gif                if (String.IsNullOrEmpty(url)) return;
InBlock.gif                
if (url.Equals("about:blank")) return; ;
ExpandedSubBlockStart.gifContractedSubBlock.gif                
if (!url.StartsWith("http://"&& !url.StartsWith("https://")) dot.gif{ url = "http://" + url; }
InBlock.gif                
try
ExpandedSubBlockStart.gifContractedSubBlock.gif                
dot.gif{
InBlock.gif                    wb.Navigate(
new Uri(url));
InBlock.gif
InBlock.gif                    result 
= wb.DocumentText;
InBlock.gif                    lbResult.Text 
= result;
ExpandedSubBlockEnd.gif                }

InBlock.gif                
catch (System.UriFormatException)
ExpandedSubBlockStart.gifContractedSubBlock.gif                
dot.gif{ }
InBlock.gif                
return;
ExpandedSubBlockEnd.gif            }

InBlock.gif            
catch (Exception ex)
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
InBlock.gif                
//WriteLog.Writelog("这是获取页面全部html代码时发生的错误:" + url, ex); 
InBlock.gif
                throw ex;
InBlock.gif                
//return ;
ExpandedSubBlockEnd.gif
            }

ExpandedBlockEnd.gif        }

 
在WebForm就麻烦些了,出现错误,线程不在单线程单元中,故无法实例化 ActiveX 控件“8856f961-340a-11d0-a96b-00c04fd705a2” 

WebBrowser2.png

 代码如下:

ContractedBlock.gif ExpandedBlockStart.gif Code
None.gif private string GetPageStringbyWebBrowser(string url)
ExpandedBlockStart.gifContractedBlock.gif        
dot.gif{
InBlock.gif            
if (url.Equals("about:blank")) return null; ;
ExpandedSubBlockStart.gifContractedSubBlock.gif            
if (!url.StartsWith("http://"&& !url.StartsWith("https://")) dot.gif{ url = "http://" + url; }
InBlock.gif
InBlock.gif            WebBrowser myWB 
= new WebBrowser();
InBlock.gif            myWB.ScrollBarsEnabled 
= false;
InBlock.gif            myWB.Navigate(url);
InBlock.gif            
while (myWB.ReadyState != WebBrowserReadyState.Complete)
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
InBlock.gif                System.Windows.Forms.Application.DoEvents();
ExpandedSubBlockEnd.gif            }

InBlock.gif            
if (myWB != null)
ExpandedSubBlockStart.gifContractedSubBlock.gif            
dot.gif{
InBlock.gif                System.IO.StreamReader getReader 
= null;
InBlock.gif                
try
ExpandedSubBlockStart.gifContractedSubBlock.gif                
dot.gif{
InBlock.gif                    getReader 
= new System.IO.StreamReader(myWB.DocumentStream, System.Text.Encoding.GetEncoding(myWB.Document.Encoding));
InBlock.gif                    
string gethtml = getReader.ReadToEnd();
InBlock.gif                    
return gethtml;
ExpandedSubBlockEnd.gif                }

ExpandedSubBlockStart.gifContractedSubBlock.gif                
catch dot.gifreturn null; }
InBlock.gif                
finally
ExpandedSubBlockStart.gifContractedSubBlock.gif                
dot.gif{
ExpandedSubBlockStart.gifContractedSubBlock.gif                    
if (getReader != nulldot.gif{ getReader.Close(); }
InBlock.gif                    myWB.Dispose();
ExpandedSubBlockEnd.gif                }

ExpandedSubBlockEnd.gif            }

InBlock.gif            
return null;
ExpandedBlockEnd.gif        }

 
后来搜索N小时(N>=5)后,终于找到可行解决方案,在WebPage页面头部加入AspCompat="true"

即<%@ Page Language="C#"  AspCompat="true" ******/>

MSDN给出的解释是:
在 ASP .NET 网页的 <%@Page> 标记中包含兼容性属性 aspcompat=true,如 <%@Page aspcompat=true Language=VB%>。使用此属性将强制网页以 STA 模式执行,从而确保您的组件可以继续正确运行。如果试图使用 STA 组件但没有指定此标记,运行时将会发生异常情况。

将此属性的值设置为 true 时,将允许网页调用 COM+ 1.0 组件,该组件需要访问非管理的 ASP 内置对象。可以通过 ObjectContext 对象进行访问。

如果将此标记的值设为 true,性能会稍微有些下降。建议只在确实需要时才这样做。


终于可以了! 不知道有没有更好的方法??

WebBrowser5.png

 
附:源码下载。

邀月注:

如果不能测试,请注意是否在域(AD)环境下,如果是! 请注意设置代理和防火墙
请参考:
http://dev.csdn.net/article/83914.shtm

http://blog.csdn.net/downmoon/archive/2006/04/14/663337.aspx

http://www.cnblogs.com/downmoon/archive/2007/12/29/1019701.html

转载于:https://www.cnblogs.com/downmoon/archive/2009/07/01/1514519.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值