C# WEB网页内容采集

最新推荐文章于 2024-08-01 10:38:02 发布

nnsword

最新推荐文章于 2024-08-01 10:38:02 发布

阅读量805

点赞数

分类专栏： C#学习记录文章标签： c# web encoding string html 数据库

本文链接：https://blog.csdn.net/nnsword/article/details/2005559

版权

C#学习记录专栏收录该内容

129 篇文章 0 订阅

订阅专栏

为了完成以上的需求，我们就需要模拟浏览器浏览网页，得到页面的数据在进行分析，最后把分析的结构，即整理好的数据写入数据库。那么我们的思路就是：

　　1、发送HttpRequest请求。

　　2、接收HttpResponse返回的结果。得到特定页面的html源文件。

　　3、取出包含数据的那一部分源码。

　　4、根据html源码生成HtmlDocument，循环取出数据。

　　5、写入数据库。

程序如下：　　

//根据Url地址得到网页的html源码

view plaincopy to clipboardprint?
private string GetWebContent(string Url)

        {

            string strResult="";

            try

            {

                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);

　　　　//声明一个HttpWebRequest请求

                request.Timeout = 30000;

                //设置连接超时时间

                request.Headers.Set("Pragma", "no-cache");

                HttpWebResponse response = (HttpWebResponse)request.GetResponse();

                Stream streamReceive = response.GetResponseStream();

                Encoding encoding = Encoding.GetEncoding("GB2312");

                StreamReader streamReader = new StreamReader(streamReceive, encoding);

                strResult = streamReader.ReadToEnd();

            }

            catch

            {

                MessageBox.Show("出错");

            }

            return strResult;

        }

nnsword

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
C# WEB网页内容采集

为了完成以上的需求，我们就需要模拟浏览器浏览网页，得到页面的数据在进行分析，最后把分析的结构，即整理好的数据写入数据库。那么我们的思路就是：　　1、发送HttpRequest请求。　　2、接收HttpResponse返回的结果。得到特定页面的html源文件。　　3、取出包含数据的那一部分源码。　　4、根据html源码生成HtmlDocument，循环取出数据。　　5、写入数据库
复制链接

扫一扫

专栏目录