使用python pass声纳编程足球统计数据

总览 (Overview)

Using open source data from Wyscout, I extracted data and format into csv. To get more details, please see my article below:

Now we have prepared play by play data so I’m going to visualize football data with Python. This time, I visualize average passing position, direction and distance, this is called “Pass Sonar”.

I use Google colab so don’t need to build any environment on your laptop.

1.读取csv并查看数据 (1. Read csv and look into data)

Firstly, read the csv files.


import pandas as pd
pd.set_option(“max_columns”, 100)matches = pd.read_csv(“csv/matches.csv”)
matches_member = pd.read_csv(“csv/matches_member.csv”)
events = pd.read_csv(“csv/events.csv”)
event_kinds = pd.read_csv(“csv/eventKinds.csv”)
sub_event_kinds = pd.read_csv(“csv/subEventKinds.csv”)
players = pd.read_csv(“csv/players.csv”)
teams = pd.read_csv(“sv/teams.csv”)

I want visualize Spain v. Russia which Spain has recorded the most passes in the World Cup history, so search Spain and Russia in teams dataframe.


Image for post
matches[matches.teamId == 1598].append(matches[matches.teamId == 1598])
Image for post
events[events.matchId == 2058004]
events [events.matchId == 2058004]

It looks one match has 2000+ records. This match went through penalty round, but we need only passing play of Spain. According to document, we can find matchPeriod meaning so narrow down the events with this parameter.

- matchPeriod: the period of the match. It can be “1H” (first half of the match), “2H” (second half of the match), “E1” (first extra time), “E2” (second extra time) or “P” (penalties time);

events = events[(events.matchId == 2058004) & (events.matchPeriod != "E1") & (events.matchPeriod != "E2") & (events.matchPeriod != "P") & (events.teamId == 1598)]
Image for post

We cannot visualize over 11 players on pitch (technically we can but it’s strange), so also narrow down by players who appeared at the beginning of the game.


member_spain = matches_member[(matches_member.matchId == 2058004) & (matches_member.teamId == 1598) & (matches_member.startingF == 1)]
Image for post
events = events[events.playerId.isin(member_spain.playerId)]
Image for post

2.计算平均排名的玩家 (2. Calculate players with average position)

I found position specification on the document, so we can get average position of each player.


- positions: the origin and destination positions associated with the event. Each position is a pair of coordinates (x, y). The x and y coordinates are always in the range [0, 100] and indicate the percentage of the field from the perspective of the attacking team. In particular, the value of the x coordinate indicates the event’s nearness (in percentage) to the opponent’s goal, while the value of the y coordinates indicates the event’s nearness (in percentage) to the right side of the field;

However, a little bit tricky, x and y mean percentage. Let’s see some corner kick whose subEventId is “30”.

Image for post
events[[“fromX”, “fromY”]][events[‘subEventId’] == 30]
events [[“ fromX”,“ fromY”]] [events ['subEventId'] == 30]

We can understand 0 of x means Spain’s goal and 100 of x means opponent’s goal. (100 of y is right side for Spain according to the document)

I want to visualize vertically not horizontally, so I exchange x for y and y for x. In addition, percentage is difficult to use so convert into meters. (assume pitch size is 105 x 68)

events["fromXm"] = round((events["fromY"]*68/100),1)
events["fromYm"] = round((events["fromX"]*105/100),1)
events["toXm"] = round((events["toY"]*68/100),1)
events["toYm"] = round((events["toX"]*105/100),1)
Image for post
events[[“fromX”, “fromY”, “fromXm”, “fromYm”]]
We are ready to calculate average position of each player. Narrow down by passing play, aggregate by each player and calculate average of x and y.

events.to_csv("csv/spain_passing_events.csv",index=False) #Save...pass_events = events[events.eventId == 8]pass_position = pass_events.groupby([“playerId”],as_index=False)pass_position = pass_position.agg({“fromXm”: “mean”,”fromYm”: “mean”})
Image for post

In addition to this, we merge with player data to get player name.


pass_position = pd.merge(pass_position, players, on=”playerId”)

3.总结Pass Sonar的通过事件 (3. Summarize pass events for Pass Sonar)

We calculated average position of each player when they played passing, in addition to this, we also need distance and direction of accurate pass play. Therefore, I’m going to calculate these.

Firstly, narrow down by pass play and accurate one.


accurate_pass_events = events[(events.eventId == 8) & (events.accurateF == 1)]
Image for post

Calculate distance using Pythagoras’ theorem.


import numpy as npaccurate_pass_events[“distance”] = np.sqrt(
accurate_pass_events[“toXm”] — accurate_pass_events[“fromXm”]
) ** 2 + abs(
accurate_pass_events[“toYm”] — accurate_pass_events[“fromYm”]
) ** 2).values
Image for post
accurate_pass_events[[“fromXm”, “toXm”, “fromYm”, “toYm”, “distance”]]
Also calculate angle. We define degree is 0 when pass goes straight forward.

from numpy import linalg as LAdef calc_degree(fromX, fromY, toX, toY):
u = np.array([fromX — fromX, 105 — fromY])
v = np.array([toX — fromX, toY — fromY])
i = np.inner(u, v)
n = LA.norm(u) * LA.norm(v)
c = i / n
a = np.rad2deg(np.arccos(np.clip(c, -1.0, 1.0))) if toX — fromX < 0:
a = 360 — a
return adef calc_pass_theta(row):
return round(
)#Apply function each row
accurate_pass_events[“angle”] = accurate_pass_events.apply(
Image for post
accurate_pass_events[[“fromXm”, “toXm”, “fromYm”, “toYm”, “angle”]]
You can find 0 degree pass in the second row.


Besides, we need divide into 8 directions (anything is ok) by degree. For example, if angles is between 0-22.5 and 337.5-360, I define direction 1 (means forward).

def divide(angle, divisions):
degree = 360 / divisions
division = ((angle + (degree / 2)) // degree) + 1 if division > angle:
division = 1 return divisiondef divide_pass_direction(row):
return divide(
)accurate_pass_events[“direction”] = accurate_pass_events.apply(
Image for post
accurate_pass_events[[“angle”, “direction”]]
Oops. Can you see direction 9? This means 1, so we replace it.

accurate_pass_events = accurate_pass_events.replace({“direction”: {9: 1}})
Image for post
accurate_pass_events[[“angle”, “direction”]]
In the end, summarize accurate pass events with player and direction and calculate average pass distance.


pass_sonar = accurate_pass_events.groupby(["playerId", "direction"], as_index=False)
pass_sonar = pass_sonar.agg({"distance": "mean", "eventId": "count"})
pass_sonar = pass_sonar.rename(columns={"eventId": "amount"})
Image for post

We eventually finished data preprocessing. Let’s move on visualization.

4.可视化 (4. Visualization)

I’m going to use matplotlib for visualization.


%matplotlib inline
import matplotlib.pyplot as pltfig = plt.figure(figsize=(7,11), facecolor=’white’)
ax = fig.add_subplot(111, facecolor=’white’)
ax.set_xlim(0, 68) #Horizontal pitch size
ax.set_ylim(0, 105) #Vertical pitch size
Image for post

We plot pass sonar on average position of each player, so we use loop in pass sonar data which nested in average position data.


import matplotlib.patches as patfor _, player in pass_position.iterrows():
for _, pass_detail in pass_sonar[pass_sonar.playerId == player.playerId].iterrows():
#Start degree of direction 1
theta_left_start = 112.5
#Color coding by distance
color = “darkred”
if pass_detail.distance < 15:
color = “gold”
elif pass_detail.distance < 25:
color = “darkorange” #Calculate degree in matplotlib figure
theta_left = theta_left_start — (360 / 8) * (pass_detail.direction — 1)
theta_right = theta_left — (360 / 8) pass_wedge = pat.Wedge(
center=(player.fromXm, player.fromYm)
) ax.add_patch(pass_wedge)
Image for post

This is only 90 minutes passing plays, but Spanish players’ positions are very close and also find Jordi and Ramos short passing network while Ramos long passing to right side (maybe to Nacho?). Surprisingly, Silva and Busquets have less passes.

5.看起来更好 (5. Look better)

Do you want to make this look better? We can do this using PIL.

pip install pillow
from PIL import Image#convert matplotlib into PIL.Image
pass_sonar_img = np.array(fig.canvas.renderer.buffer_rgba())
pass_sonar_img = Image.fromarray(pass_sonar_img)field_image = Image.open("image/field.png")
Image for post

This looks better and easier to understand on football field.


That’s all!! How do you think of this? Thank you for reading.

就这样!! 您如何看待? 感谢您的阅读。

