scrapy startproject <projectname>
scrapy genspider -t crawl sohu2 sohu.com
scrapy crawl sis001
scrapy crawl sis001bot -o xxx.json -t json
调试语句
from scrapy.shell import inspect_response
inspect_response(response)
记录log
self.log('No item received for %s' % response.url,level=log.WARNING)
$x('xpath表达式')
关注
1、只爬10页
2、进一步过滤url
Django
django-admin startproject blog
python manage.py startapp sblog or django-admin startapp sblog
用文本编辑器编辑 settings.py urls.py views.py 三个文件
http://38.103.161.185/forum/thread-4437321-1-7.html
thread-/d{5-10}-1-/d{1-2}
scrapy genspider -t crawl sohu2 sohu.com
scrapy crawl sis001
scrapy crawl sis001bot -o xxx.json -t json
调试语句
from scrapy.shell import inspect_response
inspect_response(response)
记录log
self.log('No item received for %s' % response.url,level=log.WARNING)
$x('xpath表达式')
关注
1、只爬10页
2、进一步过滤url
Django
django-admin startproject blog
python manage.py startapp sblog or django-admin startapp sblog
用文本编辑器编辑 settings.py urls.py views.py 三个文件
http://38.103.161.185/forum/thread-4437321-1-7.html
thread-/d{5-10}-1-/d{1-2}