最近做了个新浪微博爬虫,用到了httpclient-4.3.3,程序运行的很好,就是一直会出现 cookie rejected警告,日志如下:
2014-06-05 10:27:17.417 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ef542aa2.538fd58b.ec8a8e2c", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:23 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
2014-06-05 10:27:17.422 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ef632aa2.538fd58b.c6dd669e", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
登录成功,昵称:佩佩菜_52350
2014-06-05 10:27:20.019 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.75d37a79.538fd58d.077976a4", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:25 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
2014-06-05 10:27:20.019 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.75e37a79.538fd58d.575a338c", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
登录成功,昵称:通吃一条街呵呵
2014-06-05 10:27:29.119 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.9fcc12df.538fd597.fcf0e3af", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:35 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
2014-06-05 10:27:29.120 [main] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.9fd812df.538fd597.e804e263", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"
登录成功,昵称:dxedf
log4j:WARN No appenders could be found for logger (com.mchange.v2.log.MLog).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - 读取系统配置:D:\Workspaces\eurlanda\DAP_EurlandaSpider\WebRoot\WEB-INF\classes\config.properties
2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.weibo.dely=12
2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.task.saveDely=1
2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.task.dely=168
2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.retryCount=3
2014-06-05 10:27:30.247 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.work_thread_num=10
2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.readTimeout=5
2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.serverPort=7077
2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.core.socket.connectTimeout=5
2014-06-05 10:27:30.248 [Thread-0] INFO com.eurlanda.spider.global.Global - jspider.work.schedule=* * 18-9 ? * 1-5|* * * ? * 1,7|* * * * * ?
2014-06-05 10:27:30.254 [Thread-0] INFO c.e.s.c.sina_weibo.SinaWeiBoCrawler - ----------- 抓取日期2010-02-23 00:00:00的数据-----------
2014-06-05 10:27:30.869 [18721437752] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ae2f61ad.538fd599.2711e9ab", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:30.870 [18721437752] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ae3b61ad.538fd599.cec3bfae", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:30.881 [zjweii@qq.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.18d93dd.538fd599.add86b40", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:30.882 [zjweii@qq.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.18ee3dd.538fd599.d7522db2", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:31.089 [18721437752] INFO c.e.s.c.sina_weibo.SinaWeiBoClient - 搜索无结果。
2014-06-05 10:27:31.280 [pbz201402@126.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.39486d50.538fd599.66e98262", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
2014-06-05 10:27:31.280 [pbz201402@126.com] WARN o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.395a6d50.538fd599.84218ee8", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"
今天实在看不下去了,在网上找一大片资料,大部分是过期的或者版本跟不上,各种尝试整理之后,找到了解决办法,其实是cookie策略的问题,重写默认的策略验证就OK了。
CookieSpecProvider easySpecProvider = new CookieSpecProvider() {
public CookieSpec create(HttpContext context) {
return new BrowserCompatSpec() {
@Override
public void validate(Cookie cookie, CookieOrigin origin)
throws MalformedCookieException {
// Oh, I am easy
}
};
}
};
Registry<CookieSpecProvider> reg = RegistryBuilder.<CookieSpecProvider>create()
.register(CookieSpecs.BEST_MATCH,
new BestMatchSpecFactory())
.register(CookieSpecs.BROWSER_COMPATIBILITY,
new BrowserCompatSpecFactory())
.register("mySpec", easySpecProvider)
.build();
RequestConfig requestConfig = RequestConfig.custom()
.setCookieSpec("mySpec")
.build();
CloseableHttpClient httpclient = HttpClients.custom()
.setDefaultCookieSpecRegistry(reg)
.setDefaultRequestConfig(requestConfig)
.build();