[ 转]C#(ASP.net)从其他网站抓取内容并截取有用信息

最新推荐文章于 2015-11-25 14:38:00 发布

weixin_34279061

最新推荐文章于 2015-11-25 14:38:00 发布

阅读量86

点赞数

本文转自：http://www.cnblogs.com/henw/archive/2011/09/23/2186387.html

1. 需要引用的类库

 
           using 
           System.Net;  
          
           using 
           System.IO;  
          
           using 
           System.Text;  
          
           using 
           System.Text.RegularExpressions;

2. 获取其他网站网页内容的关键代码

 
           WebRequest request = WebRequest.Create( 
           "http://目标网址.com/" 
           );  
          
           WebResponse response = request.GetResponse();  
          
           StreamReader reader =  
           new 
           StreamReader(response.GetResponseStream(), Encoding.GetEncoding( 
           "gb2312" 
           ));  
          
           //reader.ReadToEnd() 表示取得网页的源码  
          
           TextBox1.Text = reader.ReadToEnd();

3. 获取其他网站网页源码之后通过{正则表达式}帅选有用信息

 
           MatchCollection TitleMatchs = Regex.Matches(reader.ReadToEnd(),  
           @"发表评论</a></p></div><div class=""body"">([\s\S]*?)</div><div class=""share"">" 
           , RegexOptions.IgnoreCase | RegexOptions.Multiline);  
          
           foreach 
           (Match NextMatch  
           in 
           TitleMatchs)  
          
           {  
          
           s +=  
           "<br>" 
           + NextMatch.Groups[1].Value;  
          
           TextBox1.Text +=  
           "\n" 
           + NextMatch.Groups[1].Value;  
          
           }