下载数据,并进行可视化分析,以下学习两种格式的数据:1.CSV,对应的使用Python模块的CSV模块来处理CSV文件,2.json.对应使用json模块处理数据。
1.CSV文件格式
CSV文件是一系列的以逗号分割的数据,这样利于程序提取数据。来做一个关于天气的数据分析
import csv
filename='sitka_weather_07-2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
print(header_row)
调用csv,reader(文件),创建一个与该文件相关的阅读器,调用next()将阅读器对象传递给它,它将返回文件中的下一行。
输出结果:
打印文件头的及其位置
import csv
filename='sitka_weather_07-2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
for index,column_header in enumerate(header_row):
print(index,column_header)
输出结果:
提取读取数据
import csv
filename='sitka_weather_07-2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
highs=[]
for row in reader:
highs.append(row[1])
print(highs)
输出结果:
将字符串变为数字,
import csv
filename='sitka_weather_07-2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
highs=[]
for row in reader:
high=int(row[1])
highs.append(high)
print(highs)
现在提取数据已经完成,现在就是对数据进行处理,可视化分析。
绘制气温图表:
import csv
from matplotlib import pyplot as plt
filename='sitka_weather_07-2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
highs=[]
for row in reader:
high=int(row[1])
highs.append(high)
print(highs)
#可视化
fig=plt.figure(dpi=128.figsize=(10,6))
plt.plot(highs,c='red')
plt.title("Daily high temperatures,July 2014",fontsize=24)
plt.xlabel('',fontsize=14)
plt.ylabel("Temperature(F)",fontsize=14)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()
输出结果:
在上面x坐标并未添加时间,导入datetime模块,添加时间:
#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv
from matplotlib import pyplot as plt
from datetime import datetime
filename='sitka_weather_07-2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
dates,highs=[],[]
for row in reader:
current_date=datetime.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
high=int(row[1])
highs.append(high)
print(highs)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
plt.title("Daily high temperatures,July 2014",fontsize=24)
plt.xlabel('',fontsize=14)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=14)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()
涵盖更多的事件:
#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv
from matplotlib import pyplot as plt
from datetime import datetime
filename='sitka_weather_2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
dates,highs=[],[]
for row in reader:
current_date=datetime.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
high=int(row[1])
highs.append(high)
print(highs)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
plt.title("Daily high temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()
再绘制一个图表:
#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv
from matplotlib import pyplot as plt
from datetime import datetime
filename='sitka_weather_2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
dates,highs,lows=[],[],[]
for row in reader:
current_date=datetime.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
low=int(row[3])
lows.append(low)
high=int(row[1])
highs.append(high)
print(highs)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
plt.plot(dates,lows,c='blue')
plt.title("Daily high and low temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()
绘图板区域着色:使用方法fill_between(),它接受一个x值系列和两个y值,填充两个y值之间的空间:
#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv
from matplotlib import pyplot as plt
from datetime import datetime
filename='sitka_weather_2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
dates,highs,lows=[],[],[]
for row in reader:
current_date=datetime.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
low=int(row[3])
lows.append(low)
high=int(row[1])
highs.append(high)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha=0.5)
plt.plot(dates,lows,c='blue',alpha=0.5)
plt.fill_between(dates,highs,lows,facecolor='blue',alpha=0.1)
plt.title("Daily high and low temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()
alpha是指定颜色的透明度,取值范围为0-1,0表示完全透明,1表示完全不透明。
异常处理;python无法处理空字符,处理这种异常就可以用到之前学到的try-except异常处理:
在death_valley_2014.csv里有几个是空白;
#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv
from matplotlib import pyplot as plt
from datetime import datetime
filename='death_valley_2014.csv'
with open(filename) as f:
reader=csv.reader(f)
header_row=next(reader)
dates,highs,lows=[],[],[]
for row in reader:
try:
current_date=datetime.strptime(row[0],"%Y-%m-%d")
low=int(row[3])
high=int(row[1])
except ValueError:
print(current_date,'missing data')
else:
dates.append(current_date)
lows.append(low)
highs.append(high)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha=0.5)
plt.plot(dates,lows,c='blue',alpha=0.5)
plt.fill_between(dates,highs,lows,facecolor='blue',alpha=0.1)
plt.title("Daily high and low temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()