.net 抓取html,基于DotNET框架的抓取网页的方式

最新推荐文章于 2023-05-05 15:04:53 发布

weixin_39826809

最新推荐文章于 2023-05-05 15:04:53 发布

阅读量327

点赞数

文章标签： .net 抓取html

基于.NET框架的抓取网页的方式，下面是代码实例；// 请求web页面，获取pageStr

private static string GetWholeHtmlCode(string url)

{

string strHtml = string.Empty;

StreamReader strReader = null;

HttpWebResponse wrpContent = null;

try

{

HttpWebRequest wrqContent = (HttpWebRequest)WebRequest.Create(url);

wrqContent.Timeout = 300000;

wrpContent = (HttpWebResponse)wrqContent.GetResponse();

if (wrpContent.StatusCode != HttpStatusCode.OK)

{

//flgPageRetrieved = false;

strHtml = String.Empty;

}

if (wrpContent != null)

{

strReader = new StreamReader(wrpContent.GetResponseStream(), Encoding.GetEncoding("gb2312"));

strHtml = strReader.ReadToEnd();

}

}

catch (Exception e)

{

//flgPageRetrieved = false;

//strHtml = e.Message;

strHtml = String.Empty;

}

finally

{

if (strReader != null)

{

strReader.Close();

}

if (wrpContent != null)

{

wrpContent.Close();

}

}

return strHtml;

}

通过这个方法的调用可以获取对应的html代码，如：//采集保存到本地

string strWholeHtml = GetWholeHtmlCode(SiteMapUrl);

那么strWholeHtml这个变量中则保存了我们想要的结果，而紧紧光这样还是不够的，我们需要根据这个变量解析出想要的内容；这里使用到下面的using HtmlAgilityPack;

这是一个国外的html dom操作库，HtmlAgilityPack.dll这个库可以搜索得到，这里就不提供下载链接了。

简单解析实例–>HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(strWholeHtml);

HtmlNodeCollection divlistCollection = doc.DocumentNode.SelectNodes("html[1]/body[1]/div[@class='link']");

这样我们就可以得到所有的class为"link"的div元素了 ^_^，如果我们想要接着获取其中的内容，则可以如下这样：foreach(HtmlNode node in divlistCollection)

{

string innerHtml = node.InnerHtml;//获取其中html

string outerHtml = node.OuterHtml;//获取包含这个元素标签的这个整体html

string innderText = node.InnerText;//获取其中的纯文本内容；

}

终于好了，就这样吧。

weixin_39826809

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
.net 抓取html,基于DotNET框架的抓取网页的方式

基于.NET框架的抓取网页的方式，下面是代码实例；// 请求web页面，获取pageStrprivate static string GetWholeHtmlCode(string url){string strHtml = string.Empty;StreamReader strReader = null;HttpWebResponse wrpContent = null;try{HttpWe...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。