俄国牛人写的开源爬虫xNet

这个一个俄国牛人写的开源工具,为啥说他强悍了,因为他将所有Http协议的底层都实现了一遍,这有啥好处?只要你是写爬虫的,都会遇到一个让人抓狂的问题,就是明明知道自己Http请求头跟浏览器一模一样了,为啥还会获取不到自己想要的数据。这时你如果使用HttpWebReaquest,你只能调试到GetRespone,底层的字节流是调试不到了。所以必须得有个更深入的底层组件,方便自己调试。以下是xNet的开源地址:https://github.com/X-rus/xNet  
快速入门。

using(varrequest = newxNet.HttpRequest())

{

    varhtml = request.Get("http://www.cnblogs.com").ToString();

}

      首先来一个读取cnblogs首页的案例,HttpWebRequest在上一篇已经举例,我们看看xNet是怎么写的

注意,默认的http头,建议用属性进行设置,譬如KeepAlive,Referer和UserAgent

  扩展的Http头,譬如Upgrade-Insecure-Requests,可以使用AddHeader方法进行设置譬如

当然有些方法使用AddHeader和设置属性值是一样的,例如:

using (var request = new xNet.HttpRequest())
{
    request.AddHeader("Upgrade-Insecure-Requests", "1");
    var html=request.Get("http://www.cnblogs.com").ToString();
}

      request.AddHeader("User-Agent","Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0");

      request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0";

      request.UserAgent = xNet.Http.FirefoxUserAgent();

  当然不是所有值都可以使用AddHeader进行设置,譬如:Content-Type,这是POST时说明POST的数据类型,如果使用AddHeader会报错。如果你不知道那些可以手动设那些是不能的,参考xNet.HttpHeader的枚举值

public enum HttpHeader
{
    Accept = 0,
    AcceptCharset = 1,
    AcceptLanguage = 2,
    AcceptDatetime = 3,
    CacheControl = 4,
    ContentType = 5,
    Date = 6,
    Expect = 7,
    From = 8,
    IfMatch = 9,
    IfModifiedSince = 10,
    IfNoneMatch = 11,
    IfRange = 12,
    IfUnmodifiedSince = 13,
    MaxForwards = 14,
    Pragma = 15,
    Range = 16,
    Referer = 17,
    Upgrade = 18,
    UserAgent = 19,
    Via = 20,
    Warning = 21,
    DNT = 22,
    AccessControlAllowOrigin = 23,
    AcceptRanges = 24,
    Age = 25,
    Allow = 26,
    ContentEncoding = 27,
    ContentLanguage = 28,
    ContentLength = 29,
    ContentLocation = 30,
    ContentMD5 = 31,
    ContentDisposition = 32,
    ContentRange = 33,
    ETag = 34,
    Expires = 35,
    LastModified = 36,
    Link = 37,
    Location = 38,
    P3P = 39,
    Refresh = 40,
    RetryAfter = 41,
    Server = 42,
    TransferEncoding = 43,
}

 当然他还支持Socks4和Socks5,代理的好处不言而喻了

转载于:https://my.oschina.net/lichaoqiang/blog/881264

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
这个一个俄国牛人开源工具,为啥说他强悍了,因为他将所有Http协议的底层都实现了一遍,这有啥好处?只要你是爬虫的,都会遇到一个让人抓狂的问题,就是明明知道自己Http请求头跟浏览器一模一样了,为啥还会获取不到自己想要的数据。这时你如果使用HttpWebReaquest,你只能调试到GetRespone,底层的字节流是调试不到了。所以必须得有个更深入的底层组件,方便自己调试。以下是xNet开源地址:https://github.com/X-rus/xNet  快速入门。      首先来一个读取cnblogs首页的案例,HttpWebRequest在上一篇已经举例,我们看看xNet是怎么的using (var request = new xNet.HttpRequest()){    var html = request.Get("http://www.cnblogs.com").ToString();}注意,默认的http头,建议用属性进行设置,譬如KeepAlive,Referer和UserAgent  扩展的Http头,譬如Upgrade-Insecure-Requests,可以使用AddHeader方法进行设置譬如using (var request = new xNet.HttpRequest()){    request.AddHeader("Upgrade-Insecure-Requests", "1");    var html=request.Get("http://www.cnblogs.com").ToString();}当然有些方法使用AddHeader和设置属性值是一样的,例如:      request.AddHeader("User-Agent","Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0");      request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0";      request.UserAgent = xNet.Http.FirefoxUserAgent();  当然不是所有值都可以使用AddHeader进行设置,譬如:Content-Type,这是POST时说明POST的数据类型,如果使用AddHeader会报错。如果你不知道那些可以手动设那些是不能的,参考xNet.HttpHeader的枚举值public enum HttpHeader{    Accept = 0,    AcceptCharset = 1,    AcceptLanguage = 2,    AcceptDatetime = 3,    CacheControl = 4,    ContentType = 5,    Date = 6,    Expect = 7,    From = 8,    IfMatch = 9,    IfModifiedSince = 10,    IfNoneMatch = 11,    IfRange = 12,    IfUnmodifiedSince = 13,    MaxForwards = 14,    Pragma = 15,    Range = 16,    Referer = 17,    Upgrade = 18,    UserAgent = 19,    Via = 20,    Warning = 21,    DNT = 22,    AccessControlAllowOrigin = 23,    AcceptRanges = 24,    Age = 25,    Allow = 26,    ContentEncoding = 27,    ContentLanguage = 28,    ContentLength = 29,    ContentLocation = 30,    ContentMD5 = 31,    ContentDisposition = 32,    ContentRange = 33,    ETag = 34,    Expires = 35,    LastModified = 36,    Link = 37,    Location = 38,    P3P = 39,    Refresh = 40,    RetryAfter = 41,    Server = 42,    TransferEncoding = 43,}当然他还支持Socks4和Socks5,代理的好处不言而喻了  标签:.net爬虫
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值