新浪微博爬虫遇到的cookie rejected 问题解决办法

最近做了个新浪微博爬虫,用到了httpclient-4.3.3,程序运行的很好,就是一直会出现 cookie rejected警告,日志如下:


2014-06-05 10:27:17.417 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ef542aa2.538fd58b.ec8a8e2c", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:23 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
2014-06-05 10:27:17.422 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ef632aa2.538fd58b.c6dd669e", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
登录成功,昵称:佩佩菜_52350
2014-06-05 10:27:20.019 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.75d37a79.538fd58d.077976a4", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:25 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
2014-06-05 10:27:20.019 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.75e37a79.538fd58d.575a338c", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
登录成功,昵称:通吃一条街呵呵
2014-06-05 10:27:29.119 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.9fcc12df.538fd597.fcf0e3af", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:35 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
2014-06-05 10:27:29.120 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.9fd812df.538fd597.e804e263", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
登录成功,昵称:dxedf
log4j:WARN No appenders could be found for logger (com.mchange.v2.log.MLog).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - 读取系统配置:D:\Workspaces\eurlanda\DAP_EurlandaSpider\WebRoot\WEB-INF\classes\config.properties
2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.weibo.dely=12
2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.task.saveDely=1
2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.task.dely=168
2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.retryCount=3
2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.work_thread_num=10
2014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.readTimeout=5
2014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.serverPort=7077
2014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.connectTimeout=5
2014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.work.schedule=* * 18-9 ? * 1-5|* * * ? * 1,7|* * * * * ?
2014-06-05 10:27:30.254 [Thread-0] INFO  c.e.s.c.sina_weibo.SinaWeiBoCrawler - ----------- 抓取日期2010-02-23 00:00:00的数据-----------
2014-06-05 10:27:30.869 [18721437752] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ae2f61ad.538fd599.2711e9ab", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:30.870 [18721437752] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ae3b61ad.538fd599.cec3bfae", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:30.881 [zjweii@qq.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.18d93dd.538fd599.add86b40", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:30.882 [zjweii@qq.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.18ee3dd.538fd599.d7522db2", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:31.089 [18721437752] INFO  c.e.s.c.sina_weibo.SinaWeiBoClient - 搜索无结果。
2014-06-05 10:27:31.280 [pbz201402@126.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.39486d50.538fd599.66e98262", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:31.280 [pbz201402@126.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.395a6d50.538fd599.84218ee8", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"

今天实在看不下去了,在网上找一大片资料,大部分是过期的或者版本跟不上,各种尝试整理之后,找到了解决办法,其实是cookie策略的问题,重写默认的策略验证就OK了。

CookieSpecProvider easySpecProvider = new CookieSpecProvider() {

    public CookieSpec create(HttpContext context) {

        return new BrowserCompatSpec() {
            @Override
            public void validate(Cookie cookie, CookieOrigin origin)
                    throws MalformedCookieException {
                // Oh, I am easy
            }
        };
    }

};
Registry<CookieSpecProvider> reg = RegistryBuilder.<CookieSpecProvider>create()
        .register(CookieSpecs.BEST_MATCH,
            new BestMatchSpecFactory())
        .register(CookieSpecs.BROWSER_COMPATIBILITY,
            new BrowserCompatSpecFactory())
        .register("mySpec", easySpecProvider)
        .build();

RequestConfig requestConfig = RequestConfig.custom()
        .setCookieSpec("mySpec")
        .build();

CloseableHttpClient httpclient = HttpClients.custom()
        .setDefaultCookieSpecRegistry(reg)
        .setDefaultRequestConfig(requestConfig)
        .build();


  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 8
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值