这十套练习,教你如何用Pandas做数据分析(06)

9 篇文章 2 订阅
9 篇文章 0 订阅

练习6-统计

探索风速数据

在这里插入图片描述
步骤1 导入必要的库

运行以下代码

import pandas as pd
import datetime
步骤2 从以下地址导入数据
import pandas as pd

运行以下代码

path6 = “…/input/pandas_exercise/pandas_exercise/exercise_data/wind.data” # wind.data
步骤3 将数据作存储并且设置前三列为合适的索引
import datetime

运行以下代码

data = pd.read_table(path6, sep = “\s+”, parse_dates = [[0,1,2]])
data.head()
Yr_Mo_Dy RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
0 2061-01-01 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04
1 2061-01-02 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83
2 2061-01-03 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71
3 2061-01-04 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
4 2061-01-05 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
步骤4 2061年?我们真的有这一年的数据?创建一个函数并用它去修复这个bug

运行以下代码

def fix_century(x):
year = x.year - 100 if x.year > 1989 else x.year
return datetime.date(year, x.month, x.day)

apply the function fix_century on the column and replace the values to the right ones

data[‘Yr_Mo_Dy’] = data[‘Yr_Mo_Dy’].apply(fix_century)

data.info()

data.head()
Yr_Mo_Dy RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
0 1961-01-01 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04
1 1961-01-02 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83
2 1961-01-03 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71
3 1961-01-04 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
4 1961-01-05 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
步骤5 将日期设为索引,注意数据类型,应该是datetime64[ns]

运行以下代码

transform Yr_Mo_Dy it to date type datetime64

data[“Yr_Mo_Dy”] = pd.to_datetime(data[“Yr_Mo_Dy”])

set ‘Yr_Mo_Dy’ as the index

data = data.set_index(‘Yr_Mo_Dy’)

data.head()

data.info()

RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
Yr_Mo_Dy
1961-01-01 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04
1961-01-02 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83
1961-01-03 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71
1961-01-04 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
1961-01-05 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
步骤6 对应每一个location,一共有多少数据值缺失

运行以下代码

data.isnull().sum()
RPT 6
VAL 3
ROS 2
KIL 5
SHA 2
BIR 0
DUB 3
CLA 2
MUL 3
CLO 1
BEL 0
MAL 4
dtype: int64
步骤7 对应每一个location,一共有多少完整的数据值

运行以下代码

data.shape[0] - data.isnull().sum()
RPT 6568
VAL 6571
ROS 6572
KIL 6569
SHA 6572
BIR 6574
DUB 6571
CLA 6572
MUL 6571
CLO 6573
BEL 6574
MAL 6570
dtype: int64
步骤8 对于全体数据,计算风速的平均值

运行以下代码

data.mean().mean()
10.227982360836924
步骤9 创建一个名为loc_stats的数据框去计算并存储每个location的风速最小值,最大值,平均值和标准差

运行以下代码

loc_stats = pd.DataFrame()

loc_stats[‘min’] = data.min() # min
loc_stats[‘max’] = data.max() # max
loc_stats[‘mean’] = data.mean() # mean
loc_stats[‘std’] = data.std() # standard deviations

loc_stats
min max mean std
RPT 0.67 35.80 12.362987 5.618413
VAL 0.21 33.37 10.644314 5.267356
ROS 1.50 33.84 11.660526 5.008450
KIL 0.00 28.46 6.306468 3.605811
SHA 0.13 37.54 10.455834 4.936125
BIR 0.00 26.16 7.092254 3.968683
DUB 0.00 30.37 9.797343 4.977555
CLA 0.00 31.08 8.495053 4.499449
MUL 0.00 25.88 8.493590 4.166872
CLO 0.04 28.21 8.707332 4.503954
BEL 0.13 42.38 13.121007 5.835037
MAL 0.67 42.54 15.599079 6.699794
步骤10 创建一个名为day_stats的数据框去计算并存储所有location的风速最小值,最大值,平均值和标准差

运行以下代码

create the dataframe

day_stats = pd.DataFrame()

this time we determine axis equals to one so it gets each row.

day_stats[‘min’] = data.min(axis = 1) # min
day_stats[‘max’] = data.max(axis = 1) # max
day_stats[‘mean’] = data.mean(axis = 1) # mean
day_stats[‘std’] = data.std(axis = 1) # standard deviations

day_stats.head()
min max mean std
Yr_Mo_Dy
1961-01-01 9.29 18.50 13.018182 2.808875
1961-01-02 6.50 17.54 11.336364 3.188994
1961-01-03 6.17 18.50 11.641818 3.681912
1961-01-04 1.79 11.75 6.619167 3.198126
1961-01-05 6.17 13.33 10.630000 2.445356
步骤11 对于每一个location,计算一月份的平均风速
注意,1961年的1月和1962年的1月应该区别对待

运行以下代码

creates a new column ‘date’ and gets the values from the index

data[‘date’] = data.index

creates a column for each value from date

data[‘month’] = data[‘date’].apply(lambda date: date.month)
data[‘year’] = data[‘date’].apply(lambda date: date.year)
data[‘day’] = data[‘date’].apply(lambda date: date.day)

gets all value from the month 1 and assign to janyary_winds

january_winds = data.query(‘month == 1’)

gets the mean from january_winds, using .loc to not print the mean of month, year and day

january_winds.loc[:,‘RPT’:“MAL”].mean()
RPT 14.847325
VAL 12.914560
ROS 13.299624
KIL 7.199498
SHA 11.667734
BIR 8.054839
DUB 11.819355
CLA 9.512047
MUL 9.543208
CLO 10.053566
BEL 14.550520
MAL 18.028763
dtype: float64
步骤12 对于数据记录按照年为频率取样

运行以下代码

data.query(‘month == 1 and day == 1’)
RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL date month year day
Yr_Mo_Dy
1961-01-01 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04 1961-01-01 1 1961 1
1962-01-01 9.29 3.42 11.54 3.50 2.21 1.96 10.41 2.79 3.54 5.17 4.38 7.92 1962-01-01 1 1962 1
1963-01-01 15.59 13.62 19.79 8.38 12.25 10.00 23.45 15.71 13.59 14.37 17.58 34.13 1963-01-01 1 1963 1
1964-01-01 25.80 22.13 18.21 13.25 21.29 14.79 14.12 19.58 13.25 16.75 28.96 21.00 1964-01-01 1 1964 1
1965-01-01 9.54 11.92 9.00 4.38 6.08 5.21 10.25 6.08 5.71 8.63 12.04 17.41 1965-01-01 1 1965 1
1966-01-01 22.04 21.50 17.08 12.75 22.17 15.59 21.79 18.12 16.66 17.83 28.33 23.79 1966-01-01 1 1966 1
1967-01-01 6.46 4.46 6.50 3.21 6.67 3.79 11.38 3.83 7.71 9.08 10.67 20.91 1967-01-01 1 1967 1
1968-01-01 30.04 17.88 16.25 16.25 21.79 12.54 18.16 16.62 18.75 17.62 22.25 27.29 1968-01-01 1 1968 1
1969-01-01 6.13 1.63 5.41 1.08 2.54 1.00 8.50 2.42 4.58 6.34 9.17 16.71 1969-01-01 1 1969 1
1970-01-01 9.59 2.96 11.79 3.42 6.13 4.08 9.00 4.46 7.29 3.50 7.33 13.00 1970-01-01 1 1970 1
1971-01-01 3.71 0.79 4.71 0.17 1.42 1.04 4.63 0.75 1.54 1.08 4.21 9.54 1971-01-01 1 1971 1
1972-01-01 9.29 3.63 14.54 4.25 6.75 4.42 13.00 5.33 10.04 8.54 8.71 19.17 1972-01-01 1 1972 1
1973-01-01 16.50 15.92 14.62 7.41 8.29 11.21 13.54 7.79 10.46 10.79 13.37 9.71 1973-01-01 1 1973 1
1974-01-01 23.21 16.54 16.08 9.75 15.83 11.46 9.54 13.54 13.83 16.66 17.21 25.29 1974-01-01 1 1974 1
1975-01-01 14.04 13.54 11.29 5.46 12.58 5.58 8.12 8.96 9.29 5.17 7.71 11.63 1975-01-01 1 1975 1
1976-01-01 18.34 17.67 14.83 8.00 16.62 10.13 13.17 9.04 13.13 5.75 11.38 14.96 1976-01-01 1 1976 1
1977-01-01 20.04 11.92 20.25 9.13 9.29 8.04 10.75 5.88 9.00 9.00 14.88 25.70 1977-01-01 1 1977 1
1978-01-01 8.33 7.12 7.71 3.54 8.50 7.50 14.71 10.00 11.83 10.00 15.09 20.46 1978-01-01 1 1978 1
步骤13 对于数据记录按照月为频率取样

运行以下代码

data.query(‘day == 1’)
RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL date month year day
Yr_Mo_Dy
1961-01-01 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04 1961-01-01 1 1961 1
1961-02-01 14.25 15.12 9.04 5.88 12.08 7.17 10.17 3.63 6.50 5.50 9.17 8.00 1961-02-01 2 1961 1
1961-03-01 12.67 13.13 11.79 6.42 9.79 8.54 10.25 13.29 NaN 12.21 20.62 NaN 1961-03-01 3 1961 1
1961-04-01 8.38 6.34 8.33 6.75 9.33 9.54 11.67 8.21 11.21 6.46 11.96 7.17 1961-04-01 4 1961 1
1961-05-01 15.87 13.88 15.37 9.79 13.46 10.17 9.96 14.04 9.75 9.92 18.63 11.12 1961-05-01 5 1961 1
1961-06-01 15.92 9.59 12.04 8.79 11.54 6.04 9.75 8.29 9.33 10.34 10.67 12.12 1961-06-01 6 1961 1
1961-07-01 7.21 6.83 7.71 4.42 8.46 4.79 6.71 6.00 5.79 7.96 6.96 8.71 1961-07-01 7 1961 1
1961-08-01 9.59 5.09 5.54 4.63 8.29 5.25 4.21 5.25 5.37 5.41 8.38 9.08 1961-08-01 8 1961 1
1961-09-01 5.58 1.13 4.96 3.04 4.25 2.25 4.63 2.71 3.67 6.00 4.79 5.41 1961-09-01 9 1961 1
1961-10-01 14.25 12.87 7.87 8.00 13.00 7.75 5.83 9.00 7.08 5.29 11.79 4.04 1961-10-01 10 1961 1
1961-11-01 13.21 13.13 14.33 8.54 12.17 10.21 13.08 12.17 10.92 13.54 20.17 20.04 1961-11-01 11 1961 1
1961-12-01 9.67 7.75 8.00 3.96 6.00 2.75 7.25 2.50 5.58 5.58 7.79 11.17 1961-12-01 12 1961 1
1962-01-01 9.29 3.42 11.54 3.50 2.21 1.96 10.41 2.79 3.54 5.17 4.38 7.92 1962-01-01 1 1962 1
1962-02-01 19.12 13.96 12.21 10.58 15.71 10.63 15.71 11.08 13.17 12.62 17.67 22.71 1962-02-01 2 1962 1
1962-03-01 8.21 4.83 9.00 4.83 6.00 2.21 7.96 1.87 4.08 3.92 4.08 5.41 1962-03-01 3 1962 1
1962-04-01 14.33 12.25 11.87 10.37 14.92 11.00 19.79 11.67 14.09 15.46 16.62 23.58 1962-04-01 4 1962 1
1962-05-01 9.62 9.54 3.58 3.33 8.75 3.75 2.25 2.58 1.67 2.37 7.29 3.25 1962-05-01 5 1962 1
1962-06-01 5.88 6.29 8.67 5.21 5.00 4.25 5.91 5.41 4.79 9.25 5.25 10.71 1962-06-01 6 1962 1
1962-07-01 8.67 4.17 6.92 6.71 8.17 5.66 11.17 9.38 8.75 11.12 10.25 17.08 1962-07-01 7 1962 1
1962-08-01 4.58 5.37 6.04 2.29 7.87 3.71 4.46 2.58 4.00 4.79 7.21 7.46 1962-08-01 8 1962 1
1962-09-01 10.00 12.08 10.96 9.25 9.29 7.62 7.41 8.75 7.67 9.62 14.58 11.92 1962-09-01 9 1962 1
1962-10-01 14.58 7.83 19.21 10.08 11.54 8.38 13.29 10.63 8.21 12.92 18.05 18.12 1962-10-01 10 1962 1
1962-11-01 16.88 13.25 16.00 8.96 13.46 11.46 10.46 10.17 10.37 13.21 14.83 15.16 1962-11-01 11 1962 1
1962-12-01 18.38 15.41 11.75 6.79 12.21 8.04 8.42 10.83 5.66 9.08 11.50 11.50 1962-12-01 12 1962 1
1963-01-01 15.59 13.62 19.79 8.38 12.25 10.00 23.45 15.71 13.59 14.37 17.58 34.13 1963-01-01 1 1963 1
1963-02-01 15.41 7.62 24.67 11.42 9.21 8.17 14.04 7.54 7.54 10.08 10.17 17.67 1963-02-01 2 1963 1
1963-03-01 16.75 19.67 17.67 8.87 19.08 15.37 16.21 14.29 11.29 9.21 19.92 19.79 1963-03-01 3 1963 1
1963-04-01 10.54 9.59 12.46 7.33 9.46 9.59 11.79 11.87 9.79 10.71 13.37 18.21 1963-04-01 4 1963 1
1963-05-01 18.79 14.17 13.59 11.63 14.17 11.96 14.46 12.46 12.87 13.96 15.29 21.62 1963-05-01 5 1963 1
1963-06-01 13.37 6.87 12.00 8.50 10.04 9.42 10.92 12.96 11.79 11.04 10.92 13.67 1963-06-01 6 1963 1
… … … … … … … … … … … … … … … … …
1976-07-01 8.50 1.75 6.58 2.13 2.75 2.21 5.37 2.04 5.88 4.50 4.96 10.63 1976-07-01 7 1976 1
1976-08-01 13.00 8.38 8.63 5.83 12.92 8.25 13.00 9.42 10.58 11.34 14.21 20.25 1976-08-01 8 1976 1
1976-09-01 11.87 11.00 7.38 6.87 7.75 8.33 10.34 6.46 10.17 9.29 12.75 19.55 1976-09-01 9 1976 1
1976-10-01 10.96 6.71 10.41 4.63 7.58 5.04 5.04 5.54 6.50 3.92 6.79 5.00 1976-10-01 10 1976 1
1976-11-01 13.96 15.67 10.29 6.46 12.79 9.08 10.00 9.67 10.21 11.63 23.09 21.96 1976-11-01 11 1976 1
1976-12-01 13.46 16.42 9.21 4.54 10.75 8.67 10.88 4.83 8.79 5.91 8.83 13.67 1976-12-01 12 1976 1
1977-01-01 20.04 11.92 20.25 9.13 9.29 8.04 10.75 5.88 9.00 9.00 14.88 25.70 1977-01-01 1 1977 1
1977-02-01 11.83 9.71 11.00 4.25 8.58 8.71 6.17 5.66 8.29 7.58 11.71 16.50 1977-02-01 2 1977 1
1977-03-01 8.63 14.83 10.29 3.75 6.63 8.79 5.00 8.12 7.87 6.42 13.54 13.67 1977-03-01 3 1977 1
1977-04-01 21.67 16.00 17.33 13.59 20.83 15.96 25.62 17.62 19.41 20.67 24.37 30.09 1977-04-01 4 1977 1
1977-05-01 6.42 7.12 8.67 3.58 4.58 4.00 6.75 6.13 3.33 4.50 19.21 12.38 1977-05-01 5 1977 1
1977-06-01 7.08 5.25 9.71 2.83 2.21 3.50 5.29 1.42 2.00 0.92 5.21 5.63 1977-06-01 6 1977 1
1977-07-01 15.41 16.29 17.08 6.25 11.83 11.83 12.29 10.58 10.41 7.21 17.37 7.83 1977-07-01 7 1977 1
1977-08-01 4.33 2.96 4.42 2.33 0.96 1.08 4.96 1.87 2.33 2.04 10.50 9.83 1977-08-01 8 1977 1
1977-09-01 17.37 16.33 16.83 8.58 14.46 11.83 15.09 13.92 13.29 13.88 23.29 25.17 1977-09-01 9 1977 1
1977-10-01 16.75 15.34 12.25 9.42 16.38 11.38 18.50 13.92 14.09 14.46 22.34 29.67 1977-10-01 10 1977 1
1977-11-01 16.71 11.54 12.17 4.17 8.54 7.17 11.12 6.46 8.25 6.21 11.04 15.63 1977-11-01 11 1977 1
1977-12-01 13.37 10.92 12.42 2.37 5.79 6.13 8.96 7.38 6.29 5.71 8.54 12.42 1977-12-01 12 1977 1
1978-01-01 8.33 7.12 7.71 3.54 8.50 7.50 14.71 10.00 11.83 10.00 15.09 20.46 1978-01-01 1 1978 1
1978-02-01 27.25 24.21 18.16 17.46 27.54 18.05 20.96 25.04 20.04 17.50 27.71 21.12 1978-02-01 2 1978 1
1978-03-01 15.04 6.21 16.04 7.87 6.42 6.67 12.29 8.00 10.58 9.33 5.41 17.00 1978-03-01 3 1978 1
1978-04-01 3.42 7.58 2.71 1.38 3.46 2.08 2.67 4.75 4.83 1.67 7.33 13.67 1978-04-01 4 1978 1
1978-05-01 10.54 12.21 9.08 5.29 11.00 10.08 11.17 13.75 11.87 11.79 12.87 27.16 1978-05-01 5 1978 1
1978-06-01 10.37 11.42 6.46 6.04 11.25 7.50 6.46 5.96 7.79 5.46 5.50 10.41 1978-06-01 6 1978 1
1978-07-01 12.46 10.63 11.17 6.75 12.92 9.04 12.42 9.62 12.08 8.04 14.04 16.17 1978-07-01 7 1978 1
1978-08-01 19.33 15.09 20.17 8.83 12.62 10.41 9.33 12.33 9.50 9.92 15.75 18.00 1978-08-01 8 1978 1
1978-09-01 8.42 6.13 9.87 5.25 3.21 5.71 7.25 3.50 7.33 6.50 7.62 15.96 1978-09-01 9 1978 1
1978-10-01 9.50 6.83 10.50 3.88 6.13 4.58 4.21 6.50 6.38 6.54 10.63 14.09 1978-10-01 10 1978 1
1978-11-01 13.59 16.75 11.25 7.08 11.04 8.33 8.17 11.29 10.75 11.25 23.13 25.00 1978-11-01 11 1978 1
1978-12-01 21.29 16.29 24.04 12.79 18.21 19.29 21.54 17.21 16.71 17.83 17.75 25.70 1978-12-01 12 1978 1
216 rows × 16 columns

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值