ipl 图像_IPL数据的数据分析

ipl 图像

Being a cricket lover, I was waiting for the start of IPL,2020, as we all know this is the best tournament of the world. So, I thought to introduce myself performing IPL Data analysis with some data of IPL matches which I’ve found in Kaggle.

作为板球爱好者,我一直在等待2020年IPL比赛的开始,因为我们都知道这是世界上最好的比赛。 因此,我想介绍一下自己如何使用在Kaggle中找到的一些IPL匹配数据进行IPL数据分析。

What’s within?

里面有什么?

This data set consists of IPL matches and its details till season 10. It includes the following:

该数据集包括第10季之前的IPL匹配及其详细信息,包括以下内容:

  1. The number of matches per season

    每个赛季的比赛次数
  2. The Team who won by maximum runs

    赢得最大跑步次数的团队
  3. The Team who won by maximum wickets

    以最大检票口获胜的球队
  4. Top cities where the matches are held

    举行比赛的热门城市
  5. Most number of winning team

    获胜人数最多
  6. Is Toss Winner also the Match Winner

    是折腾冠军还是比赛冠军
  7. Maximum Toss Winners

    最大折腾获胜者
  8. Maximum Man Of Matches

    最大比赛人数
  9. Visual representation of number of matches won by runs with respect to toss winner.

    比赛相对于掷骰赢家的比赛次数的可视化表示。

So, I will try to categorize the data by analyzing IPL matches data.

因此,我将尝试通过分析IPL匹配数据来对数据进行分类。

First of all, I’ve opened Jupyter notebook (it can be done in google colab also) and import pandas, numpy, matplotlib, seaborn libraries, and load the data set in a variable named, details. It will create a copy of whole data set in memory keeping the original file unchanged.

首先,我打开了Jupyter笔记本(也可以在Google colab中完成)并导入熊猫,numpy,matplotlib,seaborn库,并将数据集加载到名为details的变量中。 它将在内存中创建整个数据集的副本,并保持原始文件不变。

Here, details.head() is used to show top five rows of the data frame. Likewise details.tail() to retrieve last 5 rows, the default value is 5 for these. There is a column, named id, which has been used as index for our data frame.

在这里,details.head()用于显示数据帧的前五行。 同样,details.tail()可以检索最后5行,这些默认值是5。 有一个名为id的列,已用作我们数据框的索引。

We can check shape or size of our dataset, by details.shape(), so we have 636 rows and 18 columns and can have the information also of the dataset. As per the below snap, its clear that its pandas DataFrame with 636 entries between 0 to 635, contains 18 columns, with datatypes & not null columns and size of 89.6 KB.

我们可以通过details.shape()检查数据集的形状或大小,因此我们有636行和18列,并且还可以获取数据集的信息。 根据下面的快照,很明显,它的pandas DataFrame具有636个介于0到635之间的条目,包含18列,数据类型为非空列,大小为89.6 KB。

Image for post
Fig:2- Shape and Information
图:2-形状和信息

We can describe the dataframe to check count, min, max, standard deviation, 25%, 50%, 75% quartile value. Here it has been done in 2 different ways, first, we checked the important facts of the dataframe and second, with all the column values, where NaN is the null value, that means there will not be any mean,std value for umpire3 as its a categorical/qualitative data, similarly for city, date, team1, team2 etc. Also, top section will tell us maximum matches were played in Mumbai, most toss winner is Mumbai Indians, most toss decision is Field first, most of the result is normal, i.e, Duckworth-Lewis (D/L) method has not been applied etc.

我们可以描述数据帧来检查计数,最小,最大,标准偏差,25%,50%,75%四分位数。 在这里,它以两种不同的方式完成,首先,我们检查了数据帧的重要事实,其次,使用所有列值,其中NaN是空值,这意味着umpire3不会有任何均值,std值,因为其分类/定性数据,同样适用于城市,日期,团队1,团队2等。此外,顶部会告诉我们在孟买进行的最大比赛,大多数掷骰的获胜者是孟买印第安人,大多数掷掷的决定是场上领先,大部分结果是正常的,即未应用Duckworth-Lewis(D / L)方法等。

Image for post
Fig:3- Describe the dataframe
图:3-描述数据框

We can also check the Standard deviation value as the Standard deviation has a proportional relationship with outlier. Now look for mean and median(50%) of each column. If mean and median are equal or nearly equal then there will be no outlier. If mean>median then the distribution will be positive skewed or if mean<median then the distribution will be negative skewed. We can also check the quartiles to see if there are skewness/outliers.IQR(Inter Quartile Range)=Q3-Q1=75%-25%Upper limit= Q1–1.5*IQRLower limit= Q3+1.5*IQRAny value beyond this limit is a outlier.We could also see the differences of 25%-min, 50%-25%, 75%-50% and max-75% to understand the symmetry of the distribution.

我们还可以检查标准偏差值,因为标准偏差与离群值具有比例关系。 现在寻找每列的均值和中位数(50%)。 如果均值和中位数相等或几乎相等,则不会有异常值。 如果均值>中位数,则分布将为正偏斜;如果均值<中位数,则分布将为负偏斜。 我们还可以检查四分位数,看是否存在偏斜/离群值。IQR(四分位数间距)= Q3-Q1 = 75%-25%上限= Q1-1.5 * IQR下限= Q3 + 1.5 * IQRAny值超出此限制这是一个离群值,我们还可以看到25%-min,50%-25%,75%-50%和max-75%的差异以了解分布的对称性。

Now, as we have seen that there are few null values, to check that, we can use details.isnull() and to check the how many null values are there for each columns we can use details.isnull().sum(). By using heatmap also, we can visualize that, its excellent features to visualize data when we have large datasets.

现在,如我们所见,空值很少,要进行检查,我们可以使用details.isnull()并检查每列中有多少个空值,我们可以使用details.isnull()。sum() 。 通过使用热图,我们可以可视化它的出色功能,以便在拥有大型数据集时可视化数据。

Image for post
Fig:4- sum of null values & its visualization
图:4-空值之和及其可视化

So, for umpire3 columns, we have maximum null values, and for analysis purpose, we can remove umpire3 by executing details.drop(‘umpire3’,axis=1,inplace=True), here inplace=True is used to save the changes permanently on the data frame, axis=1 is for column, means we want to delete umpire3 column. Similarly, we can delete rows which has null values by executing, details.dropna(axis=0,inplace=True), this is useful for our data analysis.

因此,对于umpire3列,我们具有最大的空值,并且出于分析目的,我们可以通过执行details.drop('umpire3',axis = 1,inplace = True)删除umpire3,这里inplace = True用于保存更改永久在数据帧上, axis = 1表示列,这意味着我们要删除umpire3列。 同样,我们可以通过执行details.dropna(axis = 0,inplace = True)删除具有空值的行,这对于我们的数据分析很有用。

Image for post
Fig:5- Removed Umpire3 column and rows with null values
图:5-删除了带有空值的Umpire3列和行

Now to check the no of matches, played per season, we can use, details[‘season’].value_counts() and also we can sort the values as per the requirement or interpretation and the same has been depicted in graphical way also. Here, we’ve used one column (season) to analyse, so this kind of analysis is called as Univariate analysis.

现在要检查每个赛季的比赛次数,我们可以使用details ['season']。value_counts() ,还可以根据需求或解释对值进行排序,并且也以图形方式进行了描述。 在这里,我们使用了一个列(季节)来进行分析,因此这种分析称为单变量分析。

Image for post
Fig:6- No of matches played per season
图:6-每个赛季没有比赛

Now, we have to find out the team who won by maximum runs and maximum wickets in which season.

现在,我们必须找出在哪个赛季以最大跑步次数和最大检票口获胜的球队。

Image for post
Fig:7- Team won by maximum runs
图:7-球队以最大的优势获胜
Image for post
Fig:8- Teams won by maximum wickets
无花果:8-最高门获得球队

The most number of matches played in different cities:

在不同城市玩的比赛最多:

Image for post
Fig:9- No of matches played in different cities
图:9-在不同城市进行的比赛数

The most number of wins in all the season till 2017:

到2017年为止的整个赛季中获胜次数最多:

Image for post
Fig:10- Most number of wins
图:10-获胜最多

Now we need to check is the Toss Winner team is the Match winner or not by comparing the toss winner and winner of the dataset.

现在我们需要通过比较折腾赢家和数据集赢家来检查折腾赢家团队是否是比赛赢家。

Image for post
Fig:11- % to show if Toss Winner is Match winner
图:11-%显示Toss Winner是否是比赛获胜者

Maximum Toss Winners by which Team:

哪个团队的最高胜率:

Image for post
Fig:12- Maximum Toss Winners
无花果:12-最大折腾获胜者

Maximum Man of the Matches won by players:

玩家赢得的最高比赛人数:

Image for post
Fig13:- Maximum MoM Awards
图13:-最高MoM奖

We have to analyse and find out data for a team, Kolkata Knight Riders, who won the number of matches by runs and with respect to toss winner and represent in in different plots

我们必须分析和找出加尔各答骑士车手队的数据,该队通过奔跑和折腾获胜者赢得比赛的次数,并代表不同的地块

Image for post
Fig:14- Different Types of plots
图:14-不同类型的地块

结论:(Conclusion:)

We have analysed the data of IPL matches with the help of above explanation and visualization and can conclude that Mumbai Indians has done a great job so far. This kind of analysis can help cricket statisticians more and to all the cricket lovers.

在上面的解释和可视化的帮助下,我们已经分析了IPL比赛的数据,可以得出结论,到目前为止,孟买印第安人做得很好。 这种分析可以为板球统计学家和所有板球爱好者提供更多帮助。

Reference:

参考:

  1. https://www.kaggle.com/manasgarg/ipl

    https://www.kaggle.com/manasgarg/ipl

翻译自: https://medium.com/@prolaybanik/data-analysis-on-ipl-data-acd319a313d9

ipl 图像

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值