CSV
csv文件的读取
绘制最高最低温度折线图,绘制结果如下:
读取csv文件并提取其中数据进行绘图:
import csv
from datetime import datetime
from matplotlib import pyplot as plt
"""
使用csv.reader()读取csv文件得到迭代器
在遍历列表时调用enumerate可以得到两个返回值:元素的编号和元素的值
highs存储最高温度的统计数据
try-except-else结构处理数据缺失的情况
"""
filename = 'sitka_weather_2018_simple.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
dates, highs, lows = [], [], []
for row in reader:
try:
current_date = datetime.strptime(row[2], "%Y-%m-%d")
high = int(row[5])
low = int(row[6])
except ValueError:
print(current_date, 'missing data')
else:
dates.append(current_date)
highs.append(high)
lows.append(low)
"""
seaborn为图表主题
alpha为显示度
fill_between在两条曲线之间填充颜色
"""
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.plot(dates, highs, c='red', alpha=0.5)
ax.plot(dates, lows, c='blue', alpha=0.5)
ax.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
ax.set_title("Daily high temperatures - 2018-7", fontsize=24)
ax.set_xlabel('', fontsize=16)
fig.autofmt_xdate() #旋转x轴刻度防止重叠
ax.set_ylabel("Temperature (F)", fontsize=16)
ax.tick_params(axis='both', which='major', labelsize=16)
The Datetime Module
读取时间的模组的使用
from datetime import datetime
first_date = datetime.strptime('2018-7-1', '%Y-%m-%d')
print(type(first_date))
print(first_date.strftime('%B %d %Y'))
print(first_date)
"""
<class 'datetime.datetime'>
July 01 2018
2018-07-01 00:00:00
"""
不同时间的对应关系:
%A Weekday name, such as Monday
%B Month name, such as January
%m Month, as a number (01 to 12)
%d Day of the month, as a number (01 to 31)
%Y Four-digit year, such as 2015
%y Two-digit year, such as 15
%H Hour, in 24-hour format (00 to 23)
%I Hour, in 12-hour format (01 to 12)
%p am or pm
%M Minutes (00 to 59) %S Seconds (00 to 59)
JSON
从json文件中读取股市信息并图像化:
import json
import matplotlib.pyplot as plt
filename = 'btc_close_2017.json'
with open(filename) as f:
btc_data = json.load(f)
date=[]; close=[]; months=[]
for btc_dict in btc_data:
date.append(btc_dict['date'])
months.append(int(btc_dict['month']))
close.append(int(float(btc_dict['close'])))
"""绘制股市折线图"""
plt.rcParams['font.sans-serif'] = ['SimHei']#设置中文字体
plt.style.use('ggplot')
"""散点与折线图的结合,表现为一个覆盖关系"""
plt.plot(close, linewidth=0.5)
plt.scatter(date,close, s=5)
"""date[::20]:间隔20天取一天"""
plt.xticks(date[::20],rotation=45,fontsize=6)
plt.title('收盘价',fontsize=10)
plt.tight_layout() #防止图像过大导致错误
plt.savefig('收盘价折线图.jpg',dpi=300)
对收盘价取对:
import matplotlib.pyplot as plt
import math
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.style.use('ggplot')
"""得到对10取对数的收盘价列表"""
close_log = [math.log10(xx) for xx in close]
plt.plot(close_log, linewidth=0.5)
plt.scatter(date,close_log, s=5)
plt.xticks(date[::20],rotation=45,fontsize=6)
plt.title('收盘价',fontsize=10)
plt.tight_layout()
plt.savefig('收盘价对数变换折线图.jpg',dpi=300)
Groupby()函数
对数据中连续相同的元素进行打包。第一个参数为需要处理的数据,第二个参数为一个函数作为判断的标准。
from itertools import groupby
data = ['a', 'bb', 'ccc', 'dd', 'eee', 'f']
for key, value_iter in groupby(data, len):
print(key, ':', list(value_iter))
"""
1 : ['a']
2 : ['bb']
3 : ['ccc']
2 : ['dd']
3 : ['eee']
1 : ['f']
"""
对data中的元素调换一下顺序:
from itertools import groupby
data = ['a', 'bb', 'cc', 'ddd', 'eee', 'f']
for key, value_iter in groupby(data, len):
print(key, ':', list(value_iter))
"""
1 : ['a']
2 : ['bb', 'cc']
3 : ['ddd', 'eee']
1 : ['f']
"""
Zip()函数
zip()将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的对象。返回值为迭代器。
*可以将一个对象解压。
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> c = [4,5,6,7,8]
>>> zipped = zip(a,b)
>>> list(zip(*zipped))
[(1, 2, 3), (4, 5, 6)]
>>> zipped = zip(a,b)
>>> x,y = zip(*zipped)
>>> print(x,y)
(1,2,3)(4,5,6)