最近玩线上剧本杀,想着试试爬虫,生成剧本数据的excel文档,看看都有哪些本方便我挑,将具体过程的代码叙述如下。
我发现游戏有一个分享功能可以分享剧本的连接到微信,在微信中复制连接,得到类似https://m.mszmapp.com/dm/playbook_detail?id=397(DM带本模式的剧本)或者https://m.mszmapp.com/store/bookdetail/1882(普通模式剧本)。
以https://m.mszmapp.com/store/bookdetail/1882为例,在网页上鼠标右击,菜单栏选择检查。点击network,刷新页面,可以看到很多请求。筛选Fetch/XHR,点击detail查看这个请求的Response,它是如下图所示的json数据。里面包含了大部分网页展示的信息。包括剧本id("id"),名称("name"),价格("cost"),游戏人数("num_player"),剧本海报("image")等等。
id "1882"
name "盖茨比庄园迷案"
series "无"
series_id 0
image "https://static.mszmapp.com/images/Fi9Q3psYwlDQnbMQKLDpbhJGs3v1.jpg"
club_id 1880
estimated_time "4.0"
story_text_length "13000"
background_html "<div>这是一个关于爵士乐时代的故事,就像菲茨杰拉德小说里描绘的一样。香烟、酒精、音乐、狐步舞、飞女郎的细高跟和羽毛披肩、别墅里一场接着一场上演的流动的盛宴、上流社会千金一掷的纸醉金迷。</div><div>而这一切都不过是爵士时代最为浅显的外表,真正能够定义爵士时代的,是在爵士乐停止之后那无尽的寂静和空旷。</div><div>此刻盖茨比庄园正在办一场不眠的宴会,在这里我们诚挚地邀请您一起入局。</div><div><br></div><div>开局必读:</div><div>剧本文本量较大,一共五幕请留足时间。</div><div>游戏中有两对CP(菲茨和泽尔达、威廉和阿丽塔),并设有亲密的互动环节,建议可熟人组队。</div><div>警告阿婆老粉,阿加莎味浓厚!</div>"
background "这是一个关于爵士乐时代的故事,就像菲茨杰拉德小说里描绘的一样。香烟、酒精、音乐、狐步舞、飞女郎的细高跟和羽毛披肩、别墅里一场接着一场上演的流动的盛宴、上流社会千金一掷的纸醉金迷。而这一切都不过是爵士时代最为浅显的外表,真正能够定义爵士时代的,是在爵士乐停止之后那无尽的寂静和空旷。此刻盖茨比庄园正在办一场不眠的宴会,在这里我们诚挚地邀请您一起入局。\n开局必读:剧本文本量较大,一共五幕请留足时间。游戏中有两对CP(菲茨和泽尔达、威廉和阿丽塔),并设有亲密的互动环节,建议可熟人组队。警告阿婆老粉,阿加莎味浓厚!"
num_players 5
max_player 5
min_player 5
editor_rec ""
author_rec ""
updated_time "2023-08-11 23:06:52"
time "西方"
style "现实"
level "困难"
price "999999.00"
ori_price "999999.00"
cost 29
ori_cost 39
share_cost 139
ori_share_cost 199
onsale 5
share_price "999999.00"
effect_at null
chatroom_id "5188372305"
single_mode 0
user_level 0
mark "6.9"
mark_cnt 134
publish_date "2023-08-11"
age_level 0
has_truth true
parent_playbook_id 0
chapter_id 1
chapter_name ""
chapter_image ""
has_previou_story 0
isbn ""
price_info Object { cost: 29, ori_cost: 39, share_cost: 139, … }
pay_type 1
discount "7.4"
vip_free 0
presell 0
series_name "无"
series_uri ""
authors [ {…} ]
author_id 310
author "ZNJ"
signed 1
characters [ {…}, {…}, {…}, {…}, {…} ]
custom_tag ""
adult_only false
gift 0
unlock_free_enable 1
unlock_free_cost 0
read_progress 0
share 5
own 0
played 0
share_total_cost 139
purchase 0
playbook_id "1882"
comment null
error_description null
room_count "2"
因此我们只需要按照剧本id遍历所有剧本,获取这些数据再存储到excel表格中就可以达成我们的目的。
环境配置:python3安装爬虫包urllib3,excel读写工具xlwt。
代码:
# -*- coding: utf-8 -*-
import urllib3
from urllib.parse import urlencode
import json
import xlwt
import time
import random
def main():
http=urllib3.PoolManager();
# 创建新的workbook(其实就是创建新的excel)
workbook = xlwt.Workbook(encoding= 'ascii')
# 创建新的sheet表
worksheet = workbook.add_sheet("百变大侦探全剧本数据")
stri=["剧本名","价格","难度等级","评分","风格","发生时代","人数","剧本字数","预计时间(h)","链接"]
j=0;
for st in stri:
worksheet.write(0,j,stri[j])
j=j+1
row=1
for id in range(1,3000,1):
url="https://m.mszmapp.com/api/playbook/"+str(id)+"/detail"
r=http.request('GET', url)
print(r.data)
print(id)
d=json.loads(r.data.decode('utf-8'));
if 'name' in d:
worksheet.write(row,0,d['name'])
else:
continue;
if 'cost' in d:
worksheet.write(row,1,d['cost'])
else:
continue;
if 'level' in d:
worksheet.write(row,2,d['level'])
else:
continue;
if 'mark' in d:
worksheet.write(row,3,d['mark'])
else:
continue;
if 'style' in d:
worksheet.write(row,4,d['style'])
else:
continue;
if 'time' in d:
worksheet.write(row,5,d['time'])
else:
continue;
if 'num_players' in d:
worksheet.write(row,6,d['num_players'])
else:
continue;
if 'story_text_length' in d:
worksheet.write(row,7,d['story_text_length'])
else:
continue;
if 'estimated_time' in d:
worksheet.write(row,8,d['estimated_time'])
else:
continue;
detailUrl="https://m.mszmapp.com/store/bookdetail/"+str(id)
worksheet.write(row,9,detailUrl)
row=row+1
#time.sleep(random.randint(10,30))
workbook.save("百变大侦探全剧本数据.xls")
if __name__ == '__main__':
main()
跑完打开百变大侦探全剧本数据.xls,部分结果如图:
剧本名 | 价格 | 难度等级 | 评分 | 风格 | 发生时代 | 人数 | 剧本字数 | 预计时间(h) | 链接 |
福威镖局 | 59 | 困难 | 6.2 | 现实 | 古代 | 7 | 5000 | 3.0 | https://m.mszmapp.com/store/bookdetail/1 |
江湖 | 29 | 困难 | 7.3 | 现实 | 古代 | 8 | 5000 | 2.5 | https://m.mszmapp.com/store/bookdetail/2 |
救赎之城 | 99 | 烧脑 | 6.6 | 魔幻 | 架空 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/3 |
狼人之血 | 59 | 困难 | 7.6 | 魔幻 | 古代 | 7 | 5000 | 2.5 | https://m.mszmapp.com/store/bookdetail/4 |
太空谋杀案 | 59 | 烧脑 | 6.5 | 科幻 | 架空 | 7 | 2000 | 3.0 | https://m.mszmapp.com/store/bookdetail/5 |
庙堂(江湖续集) | 59 | 困难 | 6.2 | 奇幻 | 古代 | 8 | 4000 | 3.0 | https://m.mszmapp.com/store/bookdetail/6 |
血色南宫 | 59 | 烧脑 | 6.0 | 武侠 | 古代 | 8 | 3000 | 2.5 | https://m.mszmapp.com/store/bookdetail/7 |
三国·率土之滨 | 69 | 烧脑 | 6.7 | 现实 | 古代 | 7 | 3000 | 3.5 | https://m.mszmapp.com/store/bookdetail/8 |
腥火燎园 | 59 | 烧脑 | 6.3 | 现实 | 古代 | 8 | 3000 | 2.5 | https://m.mszmapp.com/store/bookdetail/9 |
弈剑诀 | 39 | 困难 | 6.2 | 武侠 | 古代 | 7 | 1500 | 3.0 | https://m.mszmapp.com/store/bookdetail/10 |
七宗罪 | 99 | 困难 | 4.6 | 现实 | 现代 | 8 | 1500 | 2.0 | https://m.mszmapp.com/store/bookdetail/11 |
幽凝(血目续集) | 59 | 困难 | 5.9 | 奇幻 | 古代 | 6 | 2000 | 2.5 | https://m.mszmapp.com/store/bookdetail/12 |
血目 | 59 | 困难 | 6.6 | 奇幻 | 古代 | 6 | 2000 | 2.5 | https://m.mszmapp.com/store/bookdetail/13 |
剧本101 | 9999 | 入门 | 6.0 | 现实 | 古代 | 6 | 0 | 0.1 | https://m.mszmapp.com/store/bookdetail/14 |
待定 | 入门 | 10.0 | 现实 | 古代 | 0 | 0 | 0.0 | https://m.mszmapp.com/store/bookdetail/15 | |
暗影计划 | 39 | 困难 | 6.1 | 武侠 | 古代 | 6 | 0 | 2.0 | https://m.mszmapp.com/store/bookdetail/16 |
凤求凰 | 59 | 困难 | 6.3 | 奇幻 | 古代 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/17 |
消失的制作人 | 0 | 简单 | 7.8 | 现实 | 现代 | 6 | 1000 | 1.2 | https://m.mszmapp.com/store/bookdetail/18 |
猎狼 | 入门 | 10.0 | 现实 | 古代 | 0 | https://m.mszmapp.com/store/bookdetail/19 | |||
船长号的裁决(内测) | 99 | 困难 | 5.7 | 现实 | 现代 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/20 |
测试脚本3 | 入门 | 10.0 | 现实 | 古代 | -1 | 200 | 1.0 | https://m.mszmapp.com/store/bookdetail/22 | |
测试脚本4 | 入门 | 10.0 | 现实 | 古代 | -1 | 11 | 0.0 | https://m.mszmapp.com/store/bookdetail/23 | |
酒吧杀人计划 | 0 | 入门 | 7.2 | 现实 | 现代 | 5 | 1000 | 1.0 | https://m.mszmapp.com/store/bookdetail/24 |
四大名捕之铁公鸡(迷你) | 0 | 入门 | 8.2 | 武侠 | 古代 | 4 | 200 | 0.5 | https://m.mszmapp.com/store/bookdetail/26 |
孤儿房 | 入门 | 10.0 | 现实 | 古代 | 0 | https://m.mszmapp.com/store/bookdetail/27 |
这时候发现一个问题,有的剧本是无效的,是官方的测试数据,所以再筛选一次,删除所有人数小于1以及字数为0的本:
# -*- coding: utf-8 -*-
import urllib3
from urllib.parse import urlencode
import json
import xlwt
import time
import random
#此函数用来判定剧本是不是不合法的
def check(d):
if 'name' not in d:
return False
if 'story_text_length' not in d:
return False
elif int(d['story_text_length'])<1:
return False
if 'num_players' not in d:
return False
elif int(d['num_players'])<1:
return False
return True
def main():
http=urllib3.PoolManager();
# 创建新的workbook(其实就是创建新的excel)
workbook = xlwt.Workbook(encoding= 'ascii')
# 创建新的sheet表
worksheet = workbook.add_sheet("百变大侦探全剧本数据")
stri=["剧本名","价格","难度等级","评分","风格","发生时代","人数","剧本字数","预计时间(h)","链接"]
j=0;
for st in stri:
worksheet.write(0,j,stri[j])
j=j+1
row=1
for id in range(1,3000,1):
url="https://m.mszmapp.com/api/playbook/"+str(id)+"/detail"
r=http.request('GET', url)
print(r.data)
print(id)
d=json.loads(r.data.decode('utf-8'));
if check(d)==False:
continue
if 'name' in d:
worksheet.write(row,0,d['name'])
else:
continue;
if 'cost' in d:
worksheet.write(row,1,d['cost'])
else:
continue;
if 'level' in d:
worksheet.write(row,2,d['level'])
else:
continue;
if 'mark' in d:
worksheet.write(row,3,d['mark'])
else:
continue;
if 'style' in d:
worksheet.write(row,4,d['style'])
else:
continue;
if 'time' in d:
worksheet.write(row,5,d['time'])
else:
continue;
if 'num_players' in d:
worksheet.write(row,6,d['num_players'])
else:
continue;
if 'story_text_length' in d:
worksheet.write(row,7,d['story_text_length'])
else:
continue;
if 'estimated_time' in d:
worksheet.write(row,8,d['estimated_time'])
else:
continue;
detailUrl="https://m.mszmapp.com/store/bookdetail/"+str(id)
worksheet.write(row,9,detailUrl)
row=row+1
#time.sleep(random.randint(10,30))
workbook.save("百变大侦探全剧本数据.xls")
if __name__ == '__main__':
main()
现在结果看起来正常了
剧本名 | 价格 | 难度等级 | 评分 | 风格 | 发生时代 | 人数 | 剧本字数 | 预计时间(h) | 链接 |
福威镖局 | 59 | 困难 | 6.2 | 现实 | 古代 | 7 | 5000 | 3.0 | https://m.mszmapp.com/store/bookdetail/1 |
江湖 | 29 | 困难 | 7.3 | 现实 | 古代 | 8 | 5000 | 2.5 | https://m.mszmapp.com/store/bookdetail/2 |
救赎之城 | 99 | 烧脑 | 6.6 | 魔幻 | 架空 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/3 |
狼人之血 | 59 | 困难 | 7.6 | 魔幻 | 古代 | 7 | 5000 | 2.5 | https://m.mszmapp.com/store/bookdetail/4 |
太空谋杀案 | 59 | 烧脑 | 6.5 | 科幻 | 架空 | 7 | 2000 | 3.0 | https://m.mszmapp.com/store/bookdetail/5 |
庙堂(江湖续集) | 59 | 困难 | 6.2 | 奇幻 | 古代 | 8 | 4000 | 3.0 | https://m.mszmapp.com/store/bookdetail/6 |
血色南宫 | 59 | 烧脑 | 6.0 | 武侠 | 古代 | 8 | 3000 | 2.5 | https://m.mszmapp.com/store/bookdetail/7 |
三国·率土之滨 | 69 | 烧脑 | 6.7 | 现实 | 古代 | 7 | 3000 | 3.5 | https://m.mszmapp.com/store/bookdetail/8 |
腥火燎园 | 59 | 烧脑 | 6.3 | 现实 | 古代 | 8 | 3000 | 2.5 | https://m.mszmapp.com/store/bookdetail/9 |
弈剑诀 | 39 | 困难 | 6.2 | 武侠 | 古代 | 7 | 1500 | 3.0 | https://m.mszmapp.com/store/bookdetail/10 |
七宗罪 | 99 | 困难 | 4.6 | 现实 | 现代 | 8 | 1500 | 2.0 | https://m.mszmapp.com/store/bookdetail/11 |
幽凝(血目续集) | 59 | 困难 | 5.9 | 奇幻 | 古代 | 6 | 2000 | 2.5 | https://m.mszmapp.com/store/bookdetail/12 |
血目 | 59 | 困难 | 6.6 | 奇幻 | 古代 | 6 | 2000 | 2.5 | https://m.mszmapp.com/store/bookdetail/13 |
凤求凰 | 59 | 困难 | 6.3 | 奇幻 | 古代 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/17 |
消失的制作人 | 0 | 简单 | 7.8 | 现实 | 现代 | 6 | 1000 | 1.2 | https://m.mszmapp.com/store/bookdetail/18 |
船长号的裁决(内测) | 99 | 困难 | 5.7 | 现实 | 现代 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/20 |
酒吧杀人计划 | 0 | 入门 | 7.2 | 现实 | 现代 | 5 | 1000 | 1.0 | https://m.mszmapp.com/store/bookdetail/24 |
四大名捕之铁公鸡(迷你) | 0 | 入门 | 8.2 | 武侠 | 古代 | 4 | 200 | 0.5 | https://m.mszmapp.com/store/bookdetail/26 |
张府悬案 | 简单 | 7.9 | 现实 | 古代 | 5 | 800 | 1.0 | https://m.mszmapp.com/store/bookdetail/28 | |
南越王陵 | 69 | 烧脑 | 6.7 | 现实 | 现代 | 7 | 3000 | 3.0 | https://m.mszmapp.com/store/bookdetail/29 |
消失的爱人 | 0 | 简单 | 7.9 | 现实 | 现代 | 7 | 1500 | 2.0 | https://m.mszmapp.com/store/bookdetail/31 |
流水线惨案(迷你) | 入门 | 8.1 | 现实 | 现代 | 4 | 300 | 0.5 | https://m.mszmapp.com/store/bookdetail/35 | |
罪恶班级 | 0 | 简单 | 8.0 | 现实 | 现代 | 5 | 500 | 1.0 | https://m.mszmapp.com/store/bookdetail/37 |
死亡数字 | 0 | 简单 | 7.5 | 现实 | 现代 | 6 | 500 | 1.5 | https://m.mszmapp.com/store/bookdetail/38 |
消失的客人 | 0 | 简单 | 7.4 | 现实 | 现代 | 5 | 500 | 1.0 | https://m.mszmapp.com/store/bookdetail/39 |
环彩·命 | 0 | 简单 | 7.4 | 现实 | 古代 | 4 | 1000 | 1.5 | https://m.mszmapp.com/store/bookdetail/42 |
米夏尔庄园 | 0 | 简单 | 7.1 | 现实 | 现代 | 5 | 800 | 1.0 | https://m.mszmapp.com/store/bookdetail/43 |
疯狂博士 | 0 | 困难 | 7.7 | 科幻 | 架空 | 4 | 500 | 1.0 | https://m.mszmapp.com/store/bookdetail/45 |
平凡客栈 | 0 | 简单 | 7.4 | 武侠 | 古代 | 6 | 1800 | 1.2 | https://m.mszmapp.com/store/bookdetail/46 |
商海忍法帖 | 0 | 简单 | 7.4 | 现实 | 现代 | 5 | 1000 | 1.5 | https://m.mszmapp.com/store/bookdetail/47 |
恩不爱 | 0 | 简单 | 7.6 | 现实 | 现代 | 5 | 2000 | 1.0 | https://m.mszmapp.com/store/bookdetail/48 |
宫心计 | 59 | 烧脑 | 6.8 | 现实 | 古代 | 9 | 5000 | 3.5 | https://m.mszmapp.com/store/bookdetail/49 |
百变山庄(新手教学) | 0 | 入门 | 8.2 | 现实 | 现代 | 1 | 1000 | 0.2 | https://m.mszmapp.com/store/bookdetail/50 |
酒店谋杀案 | 简单 | 8.0 | 现实 | 现代 | 5 | 800 | 0.5 | https://m.mszmapp.com/store/bookdetail/51 | |
谍影重重 | 简单 | 7.2 | 现实 | 近代 | 7 | 2000 | 1.0 | https://m.mszmapp.com/store/bookdetail/52 | |
我不是医神 | 0 | 简单 | 8.1 | 现实 | 古代 | 5 | 1000 | 1.5 | https://m.mszmapp.com/store/bookdetail/53 |
血溅天香楼 | 0 | 简单 | 7.2 | 现实 | 古代 | 6 | 4000 | 1.5 | https://m.mszmapp.com/store/bookdetail/54 |