熊猫烧香源码分析_熊猫体育分析入门

最新推荐文章于 2024-04-18 19:20:11 发布

weixin_26739165

最新推荐文章于 2024-04-18 19:20:11 发布

阅读量501

点赞数

文章标签： java python 算法 linux 大数据

原文链接：https://towardsdatascience.com/introduction-to-sports-analytics-with-pandas-ad6303db9e11

版权

熊猫烧香源码分析

Sports analytics is a major subfield of data science. The advancements in data collection techniques and data analysis have made it more appealing to the teams to adapt strategies based on data analytics.

运动分析是数据科学的主要子领域。数据收集技术和数据分析的进步使其对团队更具吸引力，以基于数据分析来调整策略。

Data analytics provide valuable insight into both team performance and player performance. If used wisely and systematically, data analytics is most likely to take the teams ahead of the competitors.

数据分析可提供有关团队绩效和球员绩效的宝贵见解。如果明智且系统地使用数据分析，则最有可能使团队领先于竞争对手。

Some clubs have an entire team dedicated to data analytics. Liverpool is a pioneer in using data analytics which I think is an important part of their success. They are the last Premier League champion and the winner of the Champions League in 2019.

一些俱乐部拥有整个团队致力于数据分析。利物浦是使用数据分析的先驱，我认为这是其成功的重要组成部分。他们是最后的英超联赛冠军和2019年的冠军联赛冠军。

In this post, we will use Pandas to draw meaningful results from German Bundesliga matches in the 2017–18 season. The datasets can be downloaded from the link. We will use a part of the datasets introduced in the paper “A public data set of spatio-temporal match events in soccer competitions”.

在本文中，我们将使用熊猫从2017-18赛季德国德甲比赛中得出有意义的结果。可以从链接下载数据集。我们将使用论文“足球比赛中时空比赛事件的公共数据集”中介绍的部分数据集。

The datasets are saved in JSON format which can easily be read into pandas dataframes.

数据集以JSON格式保存，可以轻松读取到pandas数据框中。

import numpy as np
import pandas as pdevents = pd.read_json("/content/events_Germany.json")
matches = pd.read_json("/content/matches_Germany.json")
teams = pd.read_json("/content/teams.json")
players = pd.read_json("/content/players.json")events.head()

The events dataframe contains details of events that occurred in matches. For instance, the first line tells us that player 15231 made a “simple pass” from the location (50,50) to (50,48) in the third second of the match 2516739.

事件数据帧包含匹配中发生的事件的详细信息。例如，第一行告诉我们玩家15231在比赛2516739的第三秒从位置(50,50)到(50,48)进行了“简单传递”。

The events dataframe includes player and team IDs but not the player and team names. We will add them from the teams and players dataframes using the merge function.

事件数据框包括球员和球队的ID，但不包括球员和球队的名称。我们将使用合并功能从球队和球员数据框中添加他们。

The IDs are stored in the “wyId” column in the teams and players dataframes.

这些ID存储在球队和球员数据框的“ wyId”列中。

#merge with teams
events = pd.merge(
events, teams[['name','wyId']],left_on='teamId',right_on='wyId'
)
events.rename(columns={'name':'teamName'}, inplace=True)
events.drop('wyId', axis=1, inplace=True)#merge with players
events = pd.merge(
events, players[['wyId','shortName','firstName']],
left_on ='playerId',right_on='wyId'
)
events.rename(columns={'shortName':'playerName', 'firstName':'playerFName'}, inplace=True)
events.drop('wyId', axis=1, inplace=True)

We merged the dataframes based on the columns that contain IDs and then rename the new columns. Finally, the “wyId” column is dropped because IDs are already stored in the events dataframe.

我们根据包含ID的列合并数据框，然后重命名新列。最后，因为ID已存储在事件数据框中，所以删除了“ wyId”列。

每场比赛的平均传球次数 (Average Number of Passes per Match)

The teams that dominate the game usually do more passes. In general, they are more likely to win the match. There are, of course, some exceptions.

主导比赛的球队通常会传更多球。通常，他们更有可能赢得比赛。当然，也有一些例外。

Let’s check the average number of passes per match for each team. We will first create a dataframe that contains the team name, match ID, and the number of passes done in that match.

让我们检查一下每支球队每场比赛的平均传球次数。我们将首先创建一个数据框，其中包含团队名称，比赛ID和该比赛中完成的传球次数。

pass_per_match = events[events.eventName == 'Pass']\[['teamName','matchId','eventName']]\
.groupby(['teamName','matchId']).count()\
.reset_index().rename(columns={'eventName':'numberofPasses'})

Augsburg made 471 passes in match 2516745. Here is the list of top 5 teams in terms of the number of passes per match.

奥格斯堡在比赛2516745中取得471次传球。这是每场比赛的传球次数排名前5的球队。

pass_per_match[['teamName','numberofPasses']]\
.groupby('teamName').mean()\
.sort_values(by='numberofPasses', ascending=False).round(1)[:5]

It is not a surprise that Bayern Munich has the most number of passes. They have been dominating the Bundesliga in recent years.

拜仁慕尼黑通过的次数最多也就不足为奇了。近年来，他们一直统治着德甲联赛。

球员平均传球时间 (Average Pass Length of Players)

A pass can be evaluated based on many things. Some passes are so successful that they make it extremely easy to score.

可以基于许多因素评估通过。有些通行证是如此成功，以至于它们非常容易得分。

We will focus on a quantifiable evaluation of passes which is the length. Some players are very good at long passes.

我们将专注于通过的量化评估，即长度。有些球员擅长长传。

The positions column contains the initial and final location of the ball in terms of x and y coordinates. We can calculate the length based on these coordinates. Let’s first create a dataframe that only contains the passes.

位置列包含球在x和y坐标上的初始和最终位置。我们可以根据这些坐标计算长度。首先创建一个仅包含传递的数据框。

passes = events[events.eventName=='Pass'].reset_index(drop=True)

We can now calculate the length.

现在我们可以计算长度了。

pass_length = []
for i in range(len(passes)):
    length = np.sqrt(((passes.positions[i][0]['x'] -    
    passes.positions[i][1]['x'])**2)\ + 
    ((passes.positions[i][0]['y'] - 
    passes.positions[i][1]['y'])**2))pass_length.append(length)passes['pass_length'] = pass_length

The groupby function can be used to calculate the average pass length for each player.

groupby函数可用于计算每个玩家的平均传球长度。

passes[['playerName','pass_length']].groupby('playerName')\
.agg(['mean','count']).\
sort_values(by=('pass_length','mean'), ascending=False).round(1)[:5]

We have listed the top 5 players in terms of the average pass length along with the number of passes they completed. The number of passes is important because making only 3 passes do not mean much with regards to the average. Thus, we can filter the ones that are less than a certain amount of passes.

我们根据平均传球长度和他们完成的传球次数列出了前5名选手。通过的次数很重要，因为对于平均而言，仅进行3次并不意味着太多。因此，我们可以过滤少于通过次数的那些。

获胜和不获胜的平均通过次数 (Average Number of Passes for Win and Not-Win)

Let’s do a comparison of the average number of passes between win and not-win matches. I will use the matched of B. Leverkusen as an example.

让我们比较获胜和非获胜比赛的平均传球次数。我将以勒沃库森(B. Leverkusen)的匹配为例。

We first need to add the winner of the match from the “matches” dataframe.

我们首先需要从“比赛”数据框中添加比赛的获胜者。

events = pd.merge(events, matches[['wyId','winner']], left_on='matchId', right_on='wyId')events.drop('wyId', axis=1, inplace=True)

We can now create a dataframe that only contains events whose team Id is 2446 (ID of B. Leverkusen).

现在，我们可以创建一个仅包含团队ID为2446(B。Leverkusen的ID)的事件的数据框。

leverkusen = events[events.teamId == 2446]

The winner is B. Leverkusen if the value in the “winner” column is equal to 2446. In order to calculate the average number of passes in the matches that B. Leverkusen won, we need to filter the dataframe based on the winner and eventName columns. We will then apply groupby and count to see the number of passes per match.

如果“获胜者”列中的值等于2446，则获胜者为B. Leverkusen。为了计算B. Leverkusen赢得比赛的平均传球次数，我们需要根据获胜者和eventName过滤数据帧列。然后，我们将应用groupby并计数以查看每场比赛的传球次数。

passes_in_win = leverkusen[(leverkusen.winner == 2446) & (leverkusen.eventName == 'Pass')][['matchId','eventName']].groupby('matchId').count()passes_in_notwin = leverkusen[(leverkusen.winner != 2446) & (leverkusen.eventName == 'Pass')][['matchId','eventName']].groupby('matchId').count()

We can easily get the average number of passes by applying the mean function.

通过应用均值函数，我们可以轻松获得平均通过次数。

Although making more passes does not mean a certain win, it will help you in dominating the game and increasing your chances to score.

尽管获得更多的通过并不意味着一定会获胜，但这将帮助您控制比赛并增加得分机会。

The scope of sports analytics extends far beyond what we have done in this post. However, without getting familiar with the basics, it will be harder to grasp the knowledge of more advanced techniques.

体育分析的范围远远超出了我们在本文中所做的。但是，如果不熟悉基础知识，将很难掌握更先进技术的知识。

Data visualization is also fundamental in sports analytics. How teams and players manage the pitch, the locations of shots and passes, and areas of the pitch that are covered the most provide valuable insight.

数据可视化也是体育分析的基础。团队和球员如何管理球场，射门和传球的位置以及球场上覆盖最广的区域，这些都可以提供宝贵的见解。

I will also write posts about how certain events can be visualized on the pitch. Thank you for reading. Please let me know if you have any feedback.

我还将撰写有关如何在球场上可视化某些事件的文章。感谢您的阅读。如果您有任何反馈意见，请告诉我。

[1] Pappalardo et al., (2019) A public data set of spatio-temporal match events in soccer competitions, Nature Scientific Data 6:236, https://www.nature.com/articles/s41597-019-0247-7

[1] Pappalardo等人，(2019)足球比赛中时空比赛事件的公共数据集，自然科学数据6：236， https ://www.nature.com/articles/s41597-019-0247- 7

[2] https://figshare.com/articles/Events/7770599