目录
1.创建项目
scrapy startproject pipboss
2.进入项目
cd .\pipboss\
3.创建爬虫文件
scrapy genspider bosstext www.boss.com
目录结构
setting.py设置
添加
LOG_LEVEL = "ERROR"
true变false
ROBOTSTXT_OBEY = False
启用useragent
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
4.编写爬虫文件
获取url
start_urls = [“https://www.zhipin.com/web/geek/job?query=python”]
import scrapy
class BosstextSpider(scrapy.Spider):
name = "bosstext"
# allowed_domains = ["www.boss.com"]
start_urls = ["https://www.zhipin.com/web/geek/job?query=python"]
# 解析获取数据
def parse(self, response):
print