对英雄联盟比赛预测(三)- 数据获取Java实现

系列文章目录

对英雄联盟比赛预测(一)- 特征分析
对英雄联盟比赛预测(二)- 数据获取api分析
对英雄联盟比赛预测(三)- 数据获取Java实现


声明

本文所提到的爬虫方法仅供学习研究使用,不得用于商业用途

前言


前面几篇文章,主要讲述lol比赛的影响因素和riotgames 提供对外的api,本文主要讲述如何利用前面两篇文章的内容,去真正实现一个自动化的爬虫。

一、基本框架

本文以SpringBoot为主框架,引用apache httpclient,apache commons-io和google guava为辅,另外解析json由alibaba fastjson实现,idea插件引入lombok。

二、基本配置

1. maven依赖配置

		<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.4</version>
        </dependency>

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.13</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.16</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>30.1-jre</version>
        </dependency>
        
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.75</version>
        </dependency>

2. properties配置

# the apikey applied by riotgames: https://developer.riotgames.com/
spider.riotgames.apikey=RGAPI-74967532-db62-4129-a292-a3f0d5713f88

spider.riotgames.lol.base-url=https://developer.riotgames.com
# these urls below are start with br1, because the data regions are br1, but there are some other region like kr, na1 etc.
spider.riotgames.lol.summoner.byname.url=https://br1.api.riotgames.com/tft/summoner/v1/summoners/by-name/%s

# the urls below is to get lol match details
spider.riotgames.lol.matchlist.byaccount.url=https://br1.api.riotgames.com/lol/match/v4/matchlists/by-account/%s?beginIndex=%d
spider.riotgames.lol.match.detail.url=https://br1.api.riotgames.com/lol/match/v4/matches/%s
spider.riotgames.lol.match.timelines.url=https://br1.api.riotgames.com/lol/match/v4/timelines/by-match/%s

三、模块详解

本项目中共分为

  • client
    改造httpClient,实现一些自定义的属性
  • config
    riotgames的一些配置
  • entity
    取/存数据的基本实体类
  • pipeline
    定义了两个中间件,实现类似消息队列的功能
  • scheduler
    定时任务
  • service
    爬虫的主要逻辑

六个部分实现

本次最终获得数据并未入数据库,而是以json文件的格式,保存在本地

1. client

client模块共包含两个类

  • RiotGamesClient

/**
 * @date 2021/1/16 20:34
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Component
public class RiotGamesClient {
    private static RateLimiter rateLimiterMin = RateLimiter.create(100, 2, TimeUnit.MINUTES);
    private static RateLimiter rateLimiterSec = RateLimiter.create(20, 1, TimeUnit.SECONDS);
    private static HttpClient httpClient;

    private static final String UA = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36";
    private static final String AP = "*/*";
    private static final String AL = "zh-CN,zh;q=0.9";

    @Autowired
    private RiotGamesConfig riotGamesConfig;


    private HttpClient getClient() {
        RequestConfig requestConfig = RequestConfig.custom()
                .setConnectionRequestTimeout(5000)
                .setConnectTimeout(5000).build();

        List<Header> headerList = new ArrayList<>(5);
        headerList.add(new BasicHeader(HttpHeaders.USER_AGENT, UA));
        headerList.add(new BasicHeader(HttpHeaders.ACCEPT, AP));
        headerList.add(new BasicHeader(HttpHeaders.ACCEPT_LANGUAGE, AL));
        headerList.add(new BasicHeader("Origin", riotGamesConfig.getLolBaseUrl()));
        headerList.add(new BasicHeader("X-Riot-Token", riotGamesConfig.getApiKey()));

        httpClient = HttpClientBuilder.create().
                setDefaultRequestConfig(requestConfig)
                .setDefaultHeaders(headerList)
                .setRetryHandler(retryHandler())
                .build();
        return this.httpClient;
    }

    public HttpResponse execute(HttpUriRequest httpRequest) throws IOException {
        rateLimiterMin.acquire(1);
        rateLimiterSec.acquire(1);
        return this.getClient().execute(httpRequest);
    }

    private static HttpRequestRetryHandler retryHandler(){
        return (exception, executionCount, context) -> {

            System.out.println("riotGames api retry request: " + executionCount);
            if (executionCount >= 5) {
                // Do not retry if over max retry count
                return false;
            }
            if (exception instanceof InterruptedIOException) {
                // Timeout
                return false;
            }
            if (exception instanceof UnknownHostException) {
                // Unknown host
                return false;
            }
            if (exception instanceof SSLException) {
                // SSL handshake exception
                return false;
            }

            if(exception instanceof SocketTimeoutException) {
                return true;
            }
            HttpClientContext clientContext = HttpClientContext.adapt(context);
            HttpRequest request = clientContext.getRequest();
            boolean idempotent = !(request instanceof HttpEntityEnclosingRequest);
            if (idempotent) {
                // Retry if the request is considered idempotent
                return true;
            }
            return false;
        };
    }


}
  • RiotGamesRetryHandler

/**
 * @date 2021/1/17 17:48
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
public class RiotGamesRetryHandler implements HttpRequestRetryHandler {
    @Override
    public boolean retryRequest(IOException e, int i, HttpContext httpContext) {
        return false;
    }
}

在RiotGamesClient中定义了两个RateLimiter,是因为拳头的api限制了api访问速度:每秒最多20个,每两分钟最多100个,所以,我们必须在发送http请求时对一定时间内的请求数量做限制,我们这里采用的是guava的RateLimiter。

插一句:google的guava是一个非常强大的jdk补充库,值得一学。

2. config

config这个模块只有一个配置类,为其他模块提供配置参数。

/**
 * @date 2021/1/17 1:30
 * @auth jixiang.ma@transwarp.io
 * @copyright copyright © 2021 www.jixiang.ma all right reserved.
 **/
@Configuration
@Getter
public class RiotGamesConfig {
    @Value("${spider.riotgames.apikey}")
    private String apiKey;
    
    @Value("${spider.riotgames.lol.base-url}")
    private String lolBaseUrl;
    
    @Value("${spider.riotgames.lol.summoner.byname.url}")
    private String lolSummonerUrl;
    
    @Value("${spider.riotgames.lol.matchlist.byaccount.url}")
    private String lolMatchListByAccountUrl;
    
    @Value("${spider.riotgames.lol.match.detail.url}")
    private String lolMatchDetailUrl;
    
    @Value("${spider.riotgames.lol.match.timelines.url}")
    private String lolMatchTimeLinesUrl;
}

3. entity

这个模块是整个爬虫中的基础,一切数据都依靠entity中的实体类交互。

  • Ban
/**
 * @date 2021/1/17 13:38
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
public class Ban {
    private Integer championId;
    private Integer pickTurn;
}
  • Frame
/**
 * @date 2021/1/17 14:59
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
@ToString
public class Frame {
    private List<ParticipantFrame> participantFrames;
    private Long timestamp;
    private List<Event> events;
}
  • Match

/**
 * @date 2021/1/17 13:11
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
public class Match {
    private String platformId;
    private Long gameId;
    private Integer champion;
    private Integer queue;
    private Integer season;
    private Long timestamp;
    private String role;
    private String lane;
}
  • MatchDetail
/**
 * @date 2021/1/17 13:26
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
@ToString
public class MatchDetail {
    private Long gameId;
    private String platformId;
    private Long gameCreation;
    private Long gameDuration;
    private Integer queueId;
    private Integer mapId;
    private Integer seasonId;
    private String gameVersion;
    private String gameMode;
    private String gameType;
    private List<Team> teams;
    private List<Participants> participants;
    private List<ParticipantIdentities> participantIdentities;
    private List<Ban> bans;
}
  • MatchList

/**
 * @date 2021/1/17 14:48
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Data
public class MatchList {
    private List<Match> matches;
    private Integer startIndex;
    private Integer endIndex;
    private Integer totalGames;
}
  • ParticipantFrame
/**
 * @date 2021/1/17 14:27
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Setter
@Getter
@ToString
public class ParticipantFrame {
    private Integer participantId;
    private Integer currentGold;
    private Integer totalGold;
    private Integer level;
    private Integer xp;
    private Integer minionsKilled;
    private Integer jungleMinionsKilled;
    private Integer dominionScore;
    private Integer teamScore;
}
  • ParticipantIdentities
/**
 * @date 2021/1/17 13:31
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
@ToString
public class ParticipantIdentities {
    private Integer participantId;
    private Player player;
}
  • Participants
/**
 * @date 2021/1/17 13:30
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
public class Participants {
    private Integer participantId;
    private Integer teamId;
    private Integer championId;
    private Integer spell1Id;
    private Integer spell2Id;
}
  • Player
/**
 * @date 2021/1/17 15:56
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
@ToString
public class Player {
    private String platformId;
    private String accountId;
    private String summonerName;
    private String summonerId;
    private String currentAccountId;
    private String currentPlatformId;
    private String matchHistoryUri;
}
  • Stats
/**
 * @date 2021/1/17 13:42
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
public class Stats {
    private Integer participantId;
    private Boolean win;
    private Integer item0;
    private Integer item1;
    private Integer item2;
    private Integer item3;
    private Integer item4;
    private Integer item5;
    private Integer item6;
    private Integer kills;
    private Integer deaths;
    private Integer assists;
    private Integer largestKillingSpree;
    private Integer largestMultiKill;
    private Integer killingSprees;
    private Integer longestTimeSpentLiving;
    private Integer doubleKills;
    private Integer tripleKills;
    private Integer quadraKills;
    private Integer pentaKills;
    private Integer unrealKills;
    private Integer totalDamageDealt;
    private Integer magicDamageDealt;
    private Integer physicalDamageDealt;
    private Integer trueDamageDealt;
    // 最大伤害
    private Integer largestCriticalStrike;
    private Integer totalDamageDealtToChampions;
    private Integer magicDamageDealtToChampions;
    private Integer physicalDamageDealtToChampions;
    private Integer trueDamageDealtToChampions;
    private Integer totalHeal;
    private Integer totalUnitsHealed;
    private Integer damageSelfMitigated;
    private Integer damageDealtToObjectives;
    private Integer damageDealtToTurrets;
    private Integer visionScore;
    private Integer timeCCingOthers;
    private Integer totalDamageTaken;
    private Integer magicalDamageTaken;
    private Integer physicalDamageTaken;
    private Integer trueDamageTaken;
    private Integer goldEarned;
    private Integer goldSpent;
    private Integer turretKills;
    private Integer inhibitorKills;
    private Integer totalMinionsKilled;
    private Integer neutralMinionsKilled;
    private Integer neutralMinionsKilledTeamJungle;
    private Integer neutralMinionsKilledEnemyJungle;
    private Integer totalTimeCrowdControlDealt;
    private Integer champLevel;
    private Integer visionWardsBoughtInGame;
    private Integer sightWardsBoughtInGame;
    private Integer wardsPlaced;
    private Integer wardsKilled;
    private Boolean firstBloodKill;
    private Boolean firstBloodAssist;
    private Boolean firstTowerKill;
    private Boolean firstTowerAssist;
    private Boolean firstInhibitorKill;
    private Boolean firstInhibitorAssist;
}
  • Summoner
/**
 * @date 2021/1/16 20:22
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
@ToString
public class Summoner {
    private String id;
    private String accountId;
    private String puuid;
    private String name;
    private Integer profileIconId;
    private Long revisionDate;
    private Integer summonerLevel;
}
  • Team

/**
 * @date 2021/1/17 13:30
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
public class Team {
    private Integer teamId;
    private Win win;
    private Boolean firstBlood;
    private Boolean firstTower;
    private Boolean firstInhibitor;
    private Boolean firstBaron;
    private Boolean firstDragon;
    private Boolean firstRiftHerald;
    private Integer towerKills;
    private Integer inhibitorKills;
    private Integer baronKills;
    private Integer dragonKills;
    private Integer vilemawKills;
    private Integer riftHeraldKills;
    private Integer dominionVictoryScore;
}

  • TimeLine

/**
 * @date 2021/1/17 14:39
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Getter
@Setter
@ToString
public class TimeLine {
    List<ParticipantFrame> frames;
    private Long frameInterval;
}

  • Win
/**
 * @date 2021/1/17 13:33
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
public enum Win {
    Fail, Win
}

4. pipeline

本模块实现了两个类似消息队列,所有的summoner accountId和gameId都从本模块中获取,同时为了避免又重复的accountId和gameId,用布隆过滤器做了个校验,这样可以保证所有没进入list的都进入list,已存在list中的大概率不会再进入list中。

/**
 * @date 2021/1/17 16:02
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Component
public class RiotGamesPipeLine {
    private List<String> summoners = new ArrayList<>(16);
    private List<String> gameIds = new ArrayList<>(16);
    private BloomFilter<String> summonerBloomFilter =
            BloomFilter.create(Funnels.stringFunnel(Charset.forName("UTF-8")), 10000000, 0.00001);
    private BloomFilter<String> gameIdBloomFilter =
            BloomFilter.create(Funnels.stringFunnel(Charset.forName("UTF-8")), 10000000, 0.00001);

    public void addSummoner(String accountId) {
        if(!summonerBloomFilter.mightContain(accountId)) {
            summonerBloomFilter.put(accountId);
            summoners.add(accountId);
        }
    }

    public String getSummoner() {
        Iterator<String> sIterator = summoners.listIterator();
        if(sIterator.hasNext()) {
            String acccountId = sIterator.next();
            if(acccountId != null) {
                summoners.remove(acccountId);
            }
            return acccountId;
        }
        return null;
    }

    public void addGameId(String gameId) {
        if(!gameIdBloomFilter.mightContain(gameId)) {
            gameIdBloomFilter.put(gameId);
            gameIds.add(gameId);
        }
    }

    public String getGameId() {
        Iterator<String> sIterator = gameIds.listIterator();
        if(sIterator.hasNext()) {
            String gameId = sIterator.next();;
            gameIds.remove(gameId);
            return gameId;
        }
        return null;
    }
}

5. scheduler

本模块主要处理调度任务,cron表达式表示每分钟都会调度一次,但是如果上次的任务还没执行结束,那么本次任务就不会被执行。

其中定时任务使用的是springboot自带的,所以要在启动类上加上注解@EnableScheduling


/**
 * @date 2021/1/16 20:13
 * @auth jixiang.ma
 * @copyright copyright © 2020 jixiang.ma all right reserved.
 **/
@Slf4j
@Component
public class WebCrawlerJob {
    @Autowired
    private LolService lolService;

    @Autowired
    private RiotGamesPipeLine pipeLine;

    private static Lock lock = new ReentrantLock();

    @Scheduled(cron = "0 0/1 * * * ?")
    public void crawlerJob() {
        if (lock.tryLock()) {
            try {
                doExecute();
            } finally {
                lock.unlock();
            }
        } else {
            log.warn("the previous job has not finished, so this job is canceled");
        }
    }

    private void doExecute() {
        lolService.getSummonerDetails("abc");
        String summonerAccountId = pipeLine.getSummoner();

        while (summonerAccountId != null) {
            // it is the first to get the match list, we need to use the total games to get the all gameId.
            MatchList matchList = lolService.getMatchList(summonerAccountId, 0);
            for (int i = matchList.getEndIndex(); i < matchList.getTotalGames(); i += 100) {
                lolService.getMatchList(summonerAccountId, i);
            }
            String gameId = pipeLine.getGameId();
            while (gameId != null) {
                lolService.getMatchDetail(gameId);
                lolService.getGameTimeLine(gameId);
                gameId = pipeLine.getGameId();
                System.out.println("gameId = \t" + gameId);
            }
            summonerAccountId = pipeLine.getSummoner();
        }
    }
}

6. service

本模块是整体核心,业务逻辑皆于此。各个方法的作用见名知意,就不详解了。


/**
 * @date 2021/1/16 20:21
 * @auth jixiang.ma
 * @copyright copyright © 2021 jixiang.ma all right reserved.
 **/
@Slf4j
@Service
@AllArgsConstructor(onConstructor_ = {@Autowired})
public class LolService {

    private RiotGamesClient riotGamesClient;
    private RiotGamesConfig riotGamesConfig;
    private RiotGamesPipeLine riotGamesPipeLine;

    public Summoner getSummonerDetails(String summonerName) {
        HttpGet httpGet = new HttpGet(String.format(riotGamesConfig.getLolSummonerUrl(), summonerName));
        Summoner summoner = null;
        try {
            HttpResponse response = riotGamesClient.execute(httpGet);
            String result = EntityUtils.toString(response.getEntity(), Charset.forName("UTF-8"));
            summoner = JSONObject.parseObject(result, Summoner.class);
        } catch (IOException e) {
            log.error("when get summoner details, it has error taken place!", e);
        }
        if(summoner != null) {
            riotGamesPipeLine.addSummoner(summoner.getAccountId());
        }
        return summoner;
    }

    public MatchList getMatchList(String puuid, Integer beginIndex) {
        HttpGet httpGet = new HttpGet(String.format(riotGamesConfig.getLolMatchListByAccountUrl(), puuid, beginIndex));
        MatchList matches = null;
        try {
            HttpResponse response = riotGamesClient.execute(httpGet);
            String result = EntityUtils.toString(response.getEntity(), Charset.forName("UTF-8"));
            matches = JSONObject.parseObject(result).toJavaObject(MatchList.class);
        } catch (IOException e) {
            log.error("when get match list, it has error taken place, puuid is {}", puuid, e);
        }
        matches.getMatches().forEach(match -> riotGamesPipeLine.addGameId(String.valueOf(match.getGameId())));
        return matches;
    }

    public MatchDetail getMatchDetail(String matchId) {
        HttpGet httpGet = new HttpGet(String.format(riotGamesConfig.getLolMatchDetailUrl(), matchId));
        MatchDetail matchDetail = null;
        String result = null;
        try {
            HttpResponse response = riotGamesClient.execute(httpGet);
            result = EntityUtils.toString(response.getEntity(), Charset.forName("UTF-8"));
            matchDetail = JSONObject.parseObject(result, MatchDetail.class);
        } catch (IOException e) {
            log.error("when get match detail, something error has taken place, matchId is {}", matchId, e);
        }
        matchDetail.getParticipantIdentities().forEach(participant -> participant.getPlayer().getAccountId());
        saveJsonFile(result, "E:\\riotgames\\data\\"+ matchId+"_detail.json");
        return matchDetail;
    }

    public TimeLine getGameTimeLine(String matchId) {
        HttpGet httpGet = new HttpGet(String.format(riotGamesConfig.getLolMatchTimeLinesUrl(), matchId));
        TimeLine timeLine = null;
        String result = null;
        try {
            HttpResponse response = riotGamesClient.execute(httpGet);
            result = EntityUtils.toString(response.getEntity(), Charset.forName("UTF-8"));
            timeLine = JSONObject.parseObject(result, TimeLine.class);
        } catch (IOException e) {
            log.error("when get match details, something error taken place, matchId is {}", matchId, e);
        }
        saveJsonFile(result, "E:\\riotgames\\data\\"+ matchId+"_timeline.json");
        return timeLine;
    }

    private void saveJsonFile(String data, String path) {
        if(data != null ) {
            File file = new File(path);
            if(!file.exists()) {
                try {
                    if(file.createNewFile()) {
                        FileUtils.write(file, data, false);
                    }
                } catch (IOException e) {
                    log.error("write match info to file failed.", e);
                }
            }
        }
    }

}


以下是最后运行的效果,每个game都会有两个json文件。
在这里插入图片描述


总结

本文主要讲述了一个完整的拳头数据获取的自动化爬虫,其中使用了RateLimiter做限流,用BloomFilter做防重复数据,另外还有四个获取数据的方法。

遗留问题

重启会造成资源浪费

当程序重启时,又会从abc开始抓数据,造成性能的浪费,本文为了使得此项目尽量轻量化,故而设定程序一直跑下去。

当然,有两个方法可以避免:

  • 手动修改这个name
    当进程每次重启后都修改这个值
    在这里插入图片描述
  • 把召唤师的数据落数据库
    每次启动定时任务时,都从数据库里读最新的,且没有被抓过数据的召唤师名字或者accountId。

速度慢

因为拳头限定每两分钟最多有100条请求,每天大约可以获取三万六千条游戏数据(game详情和timeline各一次)。
所以可以采用分布式爬虫或者多apikey的方式来解决问题。

apikey有效期

注意:拳头给的apikey只有24小时有效期,所以我建议把这个apikey存到数据库里,每23小时更新一次

数据不全

此数据接口只有某一个地区的数据,最后大概率会把这个地区的所有数据都拿到,但是换个地区,比如LCK,NA的数据就拿不到了,如果要拿到其他赛区的数据,就需要改配置文件,其他赛区的url,除了subdomain不同外,其他都一致。
在这里插入图片描述

解决办法:增加一个枚举类,列出所有地区id,在service里生成http request url时,去生成不同地区的url

上述所有缺陷和解决方案,会在下个版本修正,敬请期待。


码字不易,转载请注明出处。笔者水平有限,难免会有一定的错误,希望大家能不吝指出,请大家多多点赞,评论,谢谢!
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值