Scrapy
hitman.banker
Thinking in Architecture and Art
展开
-
Site Analysis Note 19
1. Static Resource HTTP Response Header cache-control:public, max-age=30758400 cf-cache-status:HIT cf-ray:1afc29518836124f-HKG content-encoding:gzip content-type:text/css date:Wed, 28 Jan 2015 09:28:原创 2015-01-28 17:52:17 · 914 阅读 · 0 评论 -
One Cause of java.net.SocketTimeoutException: Read timed out
When I try to get document from a website using jsoup, I got the error after seconds of stucking. java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Met原创 2015-03-27 16:28:45 · 1655 阅读 · 0 评论 -
Example for Simple Login
1. Open chrome and its developer tool 2. Conduct the login operation in chrome, record the network intercourse. 3. Analyze the HTTP Request and HTTP Response Request Headers POST /member/xlogin.ph原创 2015-03-27 19:05:05 · 715 阅读 · 0 评论 -
Great toolset for Web Scraping
www.freeformatter.com is very handy for web scraping stuff, atm I'm using the xpath-tester for validating my xpath expression. http://www.freeformatter.com/xpath-tester.html原创 2015-12-31 14:58:00 · 468 阅读 · 0 评论 -
Python Scraping Tools
scrapy: application framework for web scraping and crawling beautifulsoup: library for parsing HTML mechanize lxml原创 2016-01-05 06:52:01 · 860 阅读 · 0 评论