解决下载网页乱码的方法

最新推荐文章于 2024-08-11 08:41:24 发布

wushuai1346

最新推荐文章于 2024-08-11 08:41:24 发布

阅读量2.9k

点赞数

分类专栏：数据挖掘文章标签： stream string 正则表达式 windows url 2010

本文链接：https://blog.csdn.net/wushuai1346/article/details/7336833

版权

数据挖掘专栏收录该内容

2 篇文章 0 订阅

订阅专栏

之前看到有很多朋友在下载网页的时候会出现乱码的问题,也有很多朋友提出了解决方案,但是觉得都不是很正规,比如很常见的使用正则表达式抓取的那个方法.其实我们可以使用WenRequest和reponse的方法来实现.代码如下:

private static string DownloadHtml(string url)
{
    string content = string.Empty;
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
    request.Timeout = 600000;
    request.AllowAutoRedirect = true;
    request.ContentType = "application/x-www-form-urlencoded";
    request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2";
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    Stream stream = response.GetResponseStream();
    StreamReader srHtml = new StreamReader(stream, 
        Encoding.GetEncoding(response.CharacterSet));
    content = srHtml.ReadToEnd();
    response.Close();
    stream.Close();
    srHtml.Close();
    return content;
}

其实网页的编码就藏在response.CharacterSet里面,不需要使用正则来截取了.