数据分析-day06-pandas-dataFrame案例分析4：使用PeriodIndex将离散组成pandas的时间序列，然后统计不同时间段，两个地方的pm值走势

最新推荐文章于 2021-07-20 15:40:21 发布

健康平安的活着

最新推荐文章于 2021-07-20 15:40:21 发布

阅读量491

点赞数

分类专栏：数据分析

本文链接：https://blog.csdn.net/u011066470/article/details/103878696

版权

数据分析专栏收录该内容

44 篇文章 1 订阅

订阅专栏

数据集：

代码：

# -*- coding: utf-8 -*-

# @File    : pandas_dataframe_periodIndex_demo.py
# @Date    :  2020-01-07 15:59
# @Author  : admin
import pandas as pd
from matplotlib import pyplot as plt
import numpy as  np;
df=pd.read_csv("../../data/BeijingPM20100101_20151231.csv");
#df=df.head(1000);
print(df.head(5))
print(df.info())
print("=================================================将离散型的时间字段，使用preriodinde封装成pandas的时间序列=========")
d=pd.PeriodIndex(year=df["year"],month=df["month"],day=df["day"],hour=df["hour"],freq="H")
#print(d)
#新增一列
df["time"]=d;
#设置为索引
df.set_index("time",inplace=True);
print(df)
#重采样，降维，以7天显示一下
df = df.resample("7D").mean()
print(df.head())
#处理缺失数据
us_data=df["PM_US Post"].dropna();
cn_data=df["PM_Nongzhanguan"].dropna();
#us_data=df["PM_US Post"]
#cn_data=df["PM_Nongzhanguan"]
x_us=us_data.index;
x_us=[m.strftime("%Y%m%d") for m in x_us]
y_us=us_data.values;
x_cn=cn_data.index;
x_cn=[m.strftime("%Y%m%d") for m in x_cn];
y_cn=cn_data.values;
print(len(x_us),len(x_cn))
print("###################################################画图展示###########################")
plt.figure(figsize=(20,8),dpi=80)
plt.plot(range(len(x_us)),y_us,label="us",alpha=0.7);
plt.plot(range(len(x_cn)),y_cn,label="cn",alpha=0.7);
#plt.xticks(range(len(x_us)),x_us);
plt.xticks(range(0,len(x_us),10),list(x_us)[::10],rotation=45);
plt.legend(loc="best")
plt.show();

展示：

如果x轴显示的时间太多，我们可以设定步长显示：

plt.xticks(range(0,len(x_us),10),list(x_us)[::10],rotation=45);

从图上可看到：10年到13年中国的数据是缺失的，居然有数据显示，主要是中国和美国两个城市的数据长度不一致造成，

313 155

假设，不过滤nan的数据：

us_data=df["PM_US Post"].dropna();
cn_data=df["PM_Nongzhanguan"].dropna();

改为：

us_data=df["PM_US Post"]
cn_data=df["PM_Nongzhanguan"]

再执行：

健康平安的活着

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据分析-day06-pandas-dataFrame案例分析4：使用PeriodIndex将离散组成pandas的时间序列，然后统计不同时间段，两个地方的pm值走势

数据集：代码：# -*- coding: utf-8 -*-# @File : pandas_dataframe_periodIndex_demo.py# @Date : 2020-01-07 15:59# @Author : adminimport pandas as pdfrom matplotlib import pyplot as pltimpo...
复制链接

扫一扫

专栏目录