获取斗鱼的api地址,找到一个斗鱼的第三方api文档
- 创建爬虫项目
cd /d D:\workspaces\python\scrapy
python3 -m scrapy startproject douyu
cd douyu
python3 -m scrapy genspider douyutv douyu.com
- 编写爬虫脚本
D:\workspaces\python\scrapy\douyu\douyu\spiders\douyutv.py
# -*- coding: utf-8 -*-
import scrapy
import json
class DouyutvSpider(scrapy.Spider):
name = 'douyutv'
allowed_domains = ['douyucdn.cn']
baseURL = 'http://open.douyucdn.cn/api/RoomApi/live?limit=30&offset='
offset = 0
start_urls = [baseURL + str(offset)]
def parse(self, response):
data_list = json.loads(response.body.decode('utf-8'))['data']
if not len(data_list):
return
for data in data_list:
room_id = data['room_id']
owner_uid = data['owner_uid']
nickname = data['nickname']
print(room_id,owner_uid,nickname)
self.offset += 20
yield scrapy.Request(self.baseURL + str(self.offset), callback=self.parse)
- 执行爬虫脚本测试
python3 -m scrapy crawl douyutv
可以看到爬虫运行成功了,但是悲剧的是由于被斗鱼检查到了,IP被禁了
看来我要研究一下反反爬虫技术了。qvq
参考文献
说明
如需获取斗鱼第三方api文档和项目源码可通过我的微信公众号获取。