日常操作
1、创建:scrapy startproject pac(项目名称)
2、创建一个爬虫: scrapy genspider qsbk "qiushibaike.com"(名字)(要爬取地址)
3、设置:settings> >
ROBOTSTXT_OBEY = False
DOWNLOAD_DELAY = 3
DEFAULT_REQUEST_HEADERS = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
#存储内容
ITEM_PIPELINES = {
'pac.pipelines.PacPipeline': 300,
}
DOWNLOAD_DELAY = 1 --下载延迟
------中间件(写反爬虫class=A)
DOWNLOADER_MIDDLEWARES = {
'fangz.middlewares.A': 543,
}
4、设置:
》》文件
(1)contains()----包含
lis = response.xpath("//div[contains(@class,'nl_con')]/ul/li")
(2)yield scrapy.Request(url,callback,meta)
def parse_page1(self, response):
a = 你好
b= 不好
url = "www.example....."
yield scrapy.Request(url,callback=self.parse_page2,meta={'item':(a,b)})
def parse_page2(self, response):
item = response.meta.get('item')
#爬取内容
lis = response.xpath("//div[contains(@class,'nl_con')]/ul/li")
yield scrapy.Request(url,callback=self.parse_esf,meta={"info":(province,city)})#2手房
(self,response):
5、文件
》》pipelines-->数据存储
------------部署服务器----
1cmd :pip freeze > requirements.txt
多个txt,文件里面包括需要下载的附件
2,发送给服务器:rz --选中txt包
3,pip install -r requirements.txt
-------------创建虚拟环境 pip install virtualenwrapper-----
1,mkvirtualenv -p /usr/bin/python3 minzi :创建p3,名:minzi 虚拟环境
2,pip install -r requirements.txt
----------redis 分布式开发-----
1,安-装 :pip install scrapy-redis
》pac.py:
from scrapy_redis.spiders import RedisSpider
具体https://www.cnblogs.com/zhangyangcheng/articles/8150483.html