代码如下:
win+R 输入cmd 打开终端输入
cd desktop
scrapy startprojectTX movies
cd TXmovies
scrapy genspider txms v.qq.com
修改setting文件
ROBOTSTXT_OBEY=False
DOWNLOAD_DELAY=1
DEFAULT_REQUEST_HEADERS{
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'en',
'UserAgent':'Mozilla/5.0'
}
ITEM_PIPELINES={'TXmovies.pipelines.TxmoviesPipeline':300,}
确认要提取的数据,item项
import scrapy
class TxmoviesItem(scrapy.Item):
#definethefieldsforyouritemherelike:
#name=scrapy.Field()
name=scrapy.Field()
description=scrapy.Field()
写爬虫程序
import scrapy
from ..items import TxmoviesItem
class TxmsSpider(scrapy.Spider