CSDN项目之爬虫小试
项目背景
说来也巧,闲来无事,在微信中水群正好看到某群友说想爬点LOL(英雄联盟)的比赛数据,索性就要来了网址链接。网址链接:PentaQ官网
作为LPL粉丝,当然是先拿LPL模块做模板,闲话不多说,直接进入主题。
项目整体思路:
1、通过观察发现网站数据是按照赛区/赛季/赛事活动和游戏版本划分模块,换句话说,爬到某个版本中某个赛区某个赛季就可同理抓取其他版本/赛区/赛季的数据
2、以2019 LPL Summer All Patches为例,进行Overview/Team Stats/Player Stats三个模块数据爬取
爬虫语言:Python,这里对于没有Python环境的读者,建议使用Anaconda(Anaconda下载地址 )
IDE:采用Anaconda自带的Spyder,读者可根据自己使用习惯选择;接下来就是安装本次需要的python库和包,由于作者采用spyder,因此使用Anaconda Prompt进行所需模块的下载
下载模块一般采用:pip install module_name
#python模块导入
import json
import pandas as pd
import time
import requests
import datetime
from sqlalchemy import create_engine
import pymysql
import random
from numpy import *
import re,urllib.request
# from celery_app import app #定时器
Player Stats模块:作者采用的是阿里云MySQL数据库存储,读者可根据自己的存储方式,调整相应代码
class LPL():
#选手数据
def lpl_player_stats(self):
tour = 59,
url = 'https://data.pentaq.com/business_api/2018may/tournament_player_duty_data?tour=59&patch='
patch = '9.6.1', #可细分版本获取
headers = {
'Cookie':'抓取你的cookie',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'Host': 'data.pentaq.com',
'Referer': 'https://data.pentaq.com/PlayerStats?tour=59',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
}
res = requests.get(url, headers)
# print(res)
js = json.loads(res.text)['data']['players_data']
#记得注释字段
player_name=[];team_name=[];team_full_name=[];psr=[];player_id=[];team_id=[];appear=[];win=[];win_rate=[];kill=[];dead=[];assist=[];kda=[];battle_rate_per_game=[];
solo_kill=[];solo_dead=[];ten_minutes_gold_offset_per_game=[];ten_minutes_creeps_per_game=[];
exp_diff_10m_per_game=[];damage_per_minute=[];damage_percent=[];damage_efficiency=[];tank_per_minute=[];tank_percent=[];
tank_efficiency=[];put_eye_per_minute=[];destroy_eye_per_minute=[];buy_true_eye_per_minute=[];duty_id=[];lose=[];insert_time=[]
for data in js:
# print(data['player_name'])
player_name.append(data['player_name'])
team_name.append(data['team_name'])
team_full_name.append(data['team_full_name'])
psr.append(data['psr'])
player_id.append(data['player_id'])
team_id.append(data['team_id'])
appear.append(data['appear'])
win.append(data['win'])
win_rate.append(data['win_rate'])
kill.append(data['kill'])
dead.append(data['dead'])
assist.append(data['assist'])
kda.append(data['kda'])
battle_rate_per_game.append(data['battle_rate_per_game'])
solo_kill.append(data['solo_kill'])
solo_dead.append(data['solo_dead'])
ten_minutes_gold_offset_per_game.append(data['ten_minutes_gold_offset_per_game'])
ten_minutes_creeps_per_game.append(data['ten_minutes_creeps_per_game'])
exp_diff_10m_per_game.append(data['exp_diff_10m_per_game'])
damage_per_minute.append(data['damage_per_minute'])
damage_percent.append(data['damage_percent'])
damage_efficiency.append(data['damage_efficiency'])
tank_per_minute.append(data['tank_per_minute'])
tank_percent.append(data['tank_percent'])
tank_efficiency.append(data['tank_efficiency'])
put_eye_per_minute.append(data['put_eye_per_minute'])
buy_true_eye_per_minute.append(data['buy_true_eye_per_minute'])
duty_id.append(data['duty_id'])
lose.append(data['lose'