20191228_Python语言课程设计

在这里插入图片描述

#用 pandas 库读取“pollution_us_5city_2006_2010_SO2.csv”文件,查看前五行、后两行。
import pandas as pd
import matplotlib.pyplot as plt
test=pd.read_csv('pollution_us_5city_2006_2010_SO2.csv')
print(test.head(5))
print(test.tail(2))
   ID  State Code  County Code  Site Num                      Address  \
0   1           6           37      1103  1630 N MAIN ST, LOS ANGELES   
1   2           6           37      1103  1630 N MAIN ST, LOS ANGELES   
2   3           6           37      1103  1630 N MAIN ST, LOS ANGELES   
3   4           6           37      1103  1630 N MAIN ST, LOS ANGELES   
4   5           6           37      1103  1630 N MAIN ST, LOS ANGELES   

        State       County         City Date Local          SO2 Units  \
0  California  Los Angeles  Los Angeles   2006/1/1  Parts per billion   
1  California  Los Angeles  Los Angeles   2006/1/1  Parts per billion   
2  California  Los Angeles  Los Angeles   2006/1/1  Parts per billion   
3  California  Los Angeles  Los Angeles   2006/1/1  Parts per billion   
4  California  Los Angeles  Los Angeles   2006/1/2  Parts per billion   

   SO2 Mean  SO2 1st Max Value  SO2 1st Max Hour  SO2 AQI  
0  2.043478                3.0                 5      4.0  
1  2.043478                3.0                 5      4.0  
2  2.000000                2.0                 2      NaN  
3  2.000000                2.0                 2      NaN  
4  2.000000                2.0                 0      3.0  
          ID  State Code  County Code  Site Num  \
53218  53219          36           81       124   
53219  53220          36           81       124   

                                                 Address     State  County  \
53218  Queens College   65-30 Kissena Blvd  Parking L...  New York  Queens   
53219  Queens College   65-30 Kissena Blvd  Parking L...  New York  Queens   

           City  Date Local          SO2 Units  SO2 Mean  SO2 1st Max Value  \
53218  New York  2010/12/31  Parts per billion   14.8875               16.9   
53219  New York  2010/12/31  Parts per billion   14.8875               16.9   

       SO2 1st Max Hour  SO2 AQI  
53218                 5      NaN  
53219                 5      NaN  

用 pandas 数据预处理模块将缺失值填充为该列的平均值,删除列 StateCode、County Code、Site Num、Address,并将剩余列导出到 Excel 文件
“pollution_us_5city_2006_2010_SO2.xlsx”。

test.isnull().sum()
mean_cols=test['SO2 AQI'].mean()
test['SO2 AQI'] = test['SO2 AQI'].fillna(mean_cols)
test1=test.drop(['State Code','County Code','Site Num','Address'],axis=1)
test1.to_excel('pollution_us_5city_2006_2010_SO2.xlsx')

读取新的数据集“pollution_us_5city_2006_2010_SO2.xlsx”,并选择字段
City==“New York”的所有数据集,导出为文本文件“pollution_us_NewYork_2006_2010_SO2.txt”,要求数据之间用空格分隔,
每行末尾包含换行符。

test=pd.read_excel('pollution_us_5city_2006_2010_SO2.xlsx')
test2=test.loc[test['City']=="New York"]
test2.to_csv('pollution_us_NewYork_2006_2010_SO2.txt',index=0)

读取文本文件“pollution_us_NewYork_2006_2010_SO2.txt”,并选择字段
Date Local 位于[2007/1/1, 2009/12/31] 区间的所有数据集转存到 CSV 文件
“pollution_us_NewYork_2007_2009_SO2.csv”中。

test=pd.read_csv('pollution_us_NewYork_2006_2010_SO2.txt')
test['Date Local'] = pd.to_datetime(test['Date Local'])
test = test.set_index('Date Local') # 将date设置为index
test=test['2007-01-01':'2009-12-31']
test.to_csv('pollution_us_NewYork_2007_2009_SO2.csv')

读取 CSV 文件“pollution_us_NewYork_2007_2009_SO2.csv”,计算同一个
城市(字段 City)的 SO2 Mean、SO2 1st Max Hour、SO2 AQI 的月均值,
并利用 matplotlib 库可视化显示,要求包括图例、图标题,x 轴刻度以年显
示,y 轴显示刻度值,曲线颜色为红色

test=pd.read_csv('pollution_us_NewYork_2007_2009_SO2.csv')          
test.head()
Date LocalIDStateCountyCitySO2 UnitsSO2 MeanSO2 1st Max ValueSO2 1st Max HourSO2 AQI
02007-01-0115225New YorkBronxNew YorkParts per billion6.58333320.01629.000000
12007-01-0115226New YorkBronxNew YorkParts per billion6.58333320.01629.000000
22007-01-0115227New YorkBronxNew YorkParts per billion6.56250013.32010.957132
32007-01-0115228New YorkBronxNew YorkParts per billion6.56250013.32010.957132
42007-01-0215229New YorkBronxNew YorkParts per billion7.90909119.02027.000000
test['Date Local'] = test['Date Local'].apply(lambda x: pd.Timestamp(x))
# 年份
test['年']=test['Date Local'].apply(lambda x: x.year)
# 月份
test['月']=test['Date Local'].apply(lambda x: x.month)
test=test.drop(['ID','SO2 1st Max Value'],axis=1)
test_num=test.groupby(by=['年','月'],as_index=False).mean()
test_num.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36 entries, 0 to 35
Data columns (total 5 columns):
年                   36 non-null int64
月                   36 non-null int64
SO2 Mean            36 non-null float64
SO2 1st Max Hour    36 non-null float64
SO2 AQI             36 non-null float64
dtypes: float64(3), int64(2)
memory usage: 1.7 KB
test_num['年']=test_num['年'].astype('str')
test_num['月']=test_num['月'].astype('str')
test_num['all']=test_num['年']+'/'+test_num['月']
test_num.columns
Index(['年', '月', 'SO2 Mean', 'SO2 1st Max Hour', 'SO2 AQI', 'all'], dtype='object')
x=test_num['all']
y=test_num['SO2 Mean']
plt.figure(figsize=(20,10))
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False
plt.plot(x,y, 'r', label='SO2 Mean')
plt.xlabel('年')
plt.ylabel('label value')
Text(0,0.5,'label value')

在这里插入图片描述

x=test_num['all']
y=test_num['SO2 1st Max Hour']
plt.figure(figsize=(20,10))
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False
plt.plot(x,y, 'r', label='SO2 Mean')
plt.xlabel('年')
plt.ylabel('label value')
Text(0,0.5,'label value')

在这里插入图片描述

x=test_num['all']
y=test_num['SO2 AQI']
plt.figure(figsize=(20,10))
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False
plt.plot(x,y, 'r', label='SO2 Mean')
plt.xlabel('年')
plt.ylabel('label value')
Text(0,0.5,'label value')

在这里插入图片描述

  • 6
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值