Measurements for Shenyang, Chengdu, Beijing, Guangzhou, and Shanghai
数据来源:https://www.kaggle.com/uciml/pm25-data-for-five-chinese-cities
北京PM2.5随时间变化情况
数据列
The time period for this data is between Jan 1st, 2010 to Dec 31st, 2015. Missing data are denoted as NA.
- No: row number
- year: year of data in this row
- month: month of data in this row
- day: day of data in this row
- hour: hour of data in this row
- season: season of data in this row
- PM: PM2.5 concentration (ug/m^3)
- DEWP: Dew Point (Celsius Degree)
- TEMP: Temperature (Celsius Degree)
- HUMI: Humidity (%)
- PRES: Pressure (hPa)
- cbwd: Combined wind direction
- Iws: Cumulated wind speed (m/s)
- precipitation: hourly precipitation (mm)
- Iprec: Cumulated precipitation (mm)
将数据中的分离的时间字段重组为时间序列
period = pd.PeriodIndex(year=df['year'], month=df['month'], day=df['day'], hour=df['hour'], freq='H')
df['datetime'] = period
时间频率freq
将datetime设置为Index
- inplace:True替换原有数据,默认False返回新对象
df.set_index('datetime', inplace=True)
数据较多,取一个月的均值
df = df.resample('M').mean()
代码
import pandas as pd
from matplotlib import pyplot as plt
file_path = './data/BeijingPM20100101_20151231.csv'
df = pd.read_csv(file_path)
# 将数据中的分离的时间字段重组为时间序列
period = pd.PeriodIndex(year=df['year'], month=df['month'], day=df['day'], hour=df['hour'], freq='H')
df['datetime'] = period
# 将datetime指定为index
df.set_index