BOSS直聘:https://www.zhipin.com/
创建scrapy 项目:
scrapy startproject scrapyProject
创建spider文件:
scrapy genspider s_boss zhipin.com
目录
1.找接口 url
page后面传的是页数
https://www.zhipin.com/c101010100/?query=python&page={}&ka=page-next
2.s_boss.py
# -*- coding: utf-8 -*-
import scrapy
from scrapyProject.items import BossItem
from lxml import etree
class SBossSpider(scrapy.Spider):
name = 's_boss'
allowed_domains = ['zhipin.com']
start_urls = []
for page in range(1, 11):
url = 'https://www.zhipin.com/c1010101