时间序列
- 创建时间序列
- start和end以及freq配合能够生成start和end范围内以频率freq的一组时间索引
pd.date_range(start=None, end=None, freq=‘10D’),指定起始与终止时间,按照频率取数 - start和periods以及freq配合能够生成从start开始的频率为freq的periods个时间索引
pd.date_range(start=None, periods=None, freq=‘M’),指定起始与个数,按照频率取指定个 - 频率常用缩写
![在这里插入图片描述](https://img-blog.csdnimg.cn/20210318222633278.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDQ1NDg3Mg==,size_16,color_FFFFFF,t_70)
- 将字符串类型的时间戳转换成时间类型
df[“timeStamp”] = pd.to_datetime(df[“timeStamp”],format="")
format参数大部分情况下可以不用写,但是对于pandas无法格式化的时间字符串,需要该参数,如包含中文时。 - 重采样
重采样指的是将时间序列从一个频率转化为另一个频率进行处理的过程。将高频率数据转化为低频率数据为降采样;低频率转化为高频率为升采样。
df.resample(‘M’)
pandas练习
- 统计出911数据中不同月份不同类型的电话的次数的变化情况
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
pd.set_option('display.max_columns', None)
file_path = './911.csv'
phone = pd.read_csv(file_path)
print(phone.info())
phone['timeStamp'] = pd.to_datetime(phone['timeStamp'])
piece = phone['title'].str.split(':')
type = [i[0] for i in piece]
phone['type'] = pd.DataFrame(np.array(type).reshape((phone.shape[0], 1)))
phone.set_index('timeStamp', inplace=True)
plt.figure(figsize=(15, 5), dpi=80)
for group, data in phone.groupby(by='type'):
count_m = data.resample('M').count()['title']
x = count_m.index
y = count_m.values
x_label = [i.strftime('%Y%m%d') for i in x]
plt.plot(range(len(x)), y, label=group)
plt.xticks(range(len(x))[::2], x_label[::2], rotation=45)
plt.legend(loc='best')
plt.show()
- 请绘制出北京的PM2.5随时间的变化情况
import pandas as pd
from matplotlib import pyplot as plt
file_path = './PM2.5/BeijingPM20100101_20151231.csv'
data_bj = pd.read_csv(file_path)
print(data_bj.info())
period = pd.PeriodIndex(year=data_bj['year'], month=data_bj['month'], day=data_bj['day'], hour=data_bj['hour'], freq='H')
data_bj['datetime'] = period
data_bj.set_index('datetime', inplace=True)
bj_plot = data_bj['PM_US Post']
bj_plot_c = data_bj['PM_Dongsihuan']
bj_plot = bj_plot.resample('7D').mean()
bj_plot_c = bj_plot_c.resample('7D').mean()
plt.figure(figsize=(30, 10), dpi=80)
x = bj_plot.index
x_c = bj_plot_c.index
y = bj_plot.values
y_c = bj_plot_c.values
plt.plot(range(len(x)), y, label='us_post')
plt.plot(range(len(x_c)), y_c, label='cn_post')
plt.xticks(range(0, len(x), 10), x[::10], rotation=45)
plt.legend(loc='best')
plt.show()
总结
![在这里插入图片描述](https://img-blog.csdnimg.cn/20210319230158530.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDQ1NDg3Mg==,size_16,color_FFFFFF,t_70)