14. Python3 使用csv模块处理CSV(逗号分割的值)格式存储的天气数据

最新推荐文章于 2024-02-29 19:35:42 发布

逆流者blog

最新推荐文章于 2024-02-29 19:35:42 发布

阅读量822

点赞数 1

分类专栏： Python3 文章标签： python

本文链接：https://blog.csdn.net/weixin_45847167/article/details/121482309

版权

Python3 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Python的csv和datetime模块解析CSV文件中的日期，并展示了如何处理包含缺失或错误数据的情况。通过matplotlib库绘制每日最高和最低气温图表，分析了不同CSV文件中的天气数据，并展示了错误处理技巧，确保数据可视化过程的稳定性。

摘要由CSDN通过智能技术生成

CSV文件格式

CSV文件格式是逗号分隔值（Comma-Separated Values，CSV，有时也称为字符分隔值，因为分隔字符也可以不是逗号），其文件以纯文本形式存储表格数据（数字和文本）.

datetime模块

因为csv格式文件中含有日期格式, 我们使用datetime模块来解析.

datetime模块中strptime函数可根据接受的实参规则来解析日期

from datetime import datetime

first_date = datetime.strptime('2020-10-22 18:32:30', '%Y-%m-%d %H:%M:%S')
print(first_date)

测试结果:

2020-10-22 18:32:30

模块datetime中设置日期和时间格式的实参

实参	含义
%A	星期的名称, 如Monday
%B	月份名, 如January
%m	用数字表示的月份(01~12)
%d	用数字表示月份中的一天(01~31)
%Y	四位年份, 如2020
%y	两位年份, 如20
%H	24小时制的小时数(00~23)
%I	12小时制的小时数(01~12)
%p	am或pm
%M	分钟数(00~59)
%S	秒数(00~60)

使用csv模块处理文件

代码中使用到csv文件, 本文末尾有下载链接

分析sitka_weather_07-2014.csv文件

highs_lows.py

import csv
from datetime import datetime

from matplotlib import pyplot as plt

# 从文件中获取每天的最高温度
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
	reader = csv.reader(f)
	# next(reader)函数是返回reader阅读器的下一行，目前只调用了一次，所以是头行
	header_row = next(reader)
	# 打印头行，header_row它是一个列表
	# print(header_row)
	# enumerate() 获取每个元素的索引及其值
	for index, column_header in enumerate(header_row):
		print(index, column_header)

	# 第二列每天的最高温度
	dates, highs = [], []
	for row in reader:
		current_date = datetime.strptime(row[0], '%Y-%m-%d')
		print(current_date)
		dates.append(current_date)
		high = int(row[1])
		highs.append(high)

	print(dates)
	print(highs)

	# 根据数据绘制图形
	fig = plt.figure(dpi=128, figsize=(10, 6))
	plt.plot(dates, highs, c='red')

	# 设置图形的格式
	plt.title("Daily high temperatures, July 2014", fontsize=24)
	plt.xlabel("", fontsize=16)
	# 绘制斜的日期标签
	fig.autofmt_xdate()
	plt.ylabel("Temperature (F)", fontsize=16)
	plt.tick_params(axis='both', which='major', labelsize=16)

	plt.show()

提取文件的数据生成可视化图表:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f1oXQVhN-1637590864049)(/upload/2020/10/image-6d8db87a134d4af689b835f8c3d06672.png)]

上面是一个月的天气数据分析, 下面试着分析更复杂的天气图

分析sitka_weather_2014.csv

highs_lows2.py

import csv
from datetime import datetime

from matplotlib import pyplot as plt

# 从文件中获取每天的最高温度
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
	reader = csv.reader(f)
	# next(reader)函数是返回reader阅读器的下一行，目前只调用了一次，所以是头行
	header_row = next(reader)
	# 打印头行，header_row它是一个列表
	# print(header_row)
	# enumerate() 获取每个元素的索引及其值
	for index, column_header in enumerate(header_row):
		print(index, column_header)

	# 第二列每天的最高温度
	dates, highs, lows = [], [], []
	for row in reader:
		current_date = datetime.strptime(row[0], '%Y-%m-%d')
		print(current_date)
		dates.append(current_date)

		high = int(row[1])
		highs.append(high)

		low = int(row[3])
		lows.append(low)

	print(dates)
	print(highs)

	# 根据数据绘制图形
	fig = plt.figure(dpi=128, figsize=(10, 6))
	plt.plot(dates, highs, c='red')
	plt.plot(dates, lows, c='blue')

	# facecolor指定填充区域的颜色
	plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
	# 设置图形的格式
	plt.title("Daily high and low temperatures - 2014", fontsize=24)
	plt.xlabel("", fontsize=16)
	# 绘制斜的日期标签
	fig.autofmt_xdate()
	plt.ylabel("Temperature (F)", fontsize=16)
	plt.tick_params(axis='both', which='major', labelsize=16)

	plt.show()

提取文件的数据生成可视化图表:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AFSONekS-1637590864051)(/upload/2020/10/image-79913eaf9da643efbc0093b7136ffe20.png)]

在上面代码中, 有一处代码high = int(row[1])
当row[1]所在位置是没有数据时或者数据类型不对, 当使用int()转换就会报ValueError, 如下面这种报错.

Traceback (most recent call last):
  File "/Users/wushanghui/Documents/code/codechina/python3-learn/csv/highs_lows3.py", line 25, in <module>
    high = int(row[1])
ValueError: invalid literal for int() with base 10: ''

下面看怎么避免这种问题.

分析death_valley_2014.csv

highs_lows3.py

import csv
from datetime import datetime

from matplotlib import pyplot as plt

# 从文件中获取每天的最高温度
filename = 'death_valley_2014.csv'
with open(filename) as f:
	reader = csv.reader(f)
	# next(reader)函数是返回reader阅读器的下一行，目前只调用了一次，所以是头行
	header_row = next(reader)
	# 打印头行，header_row它是一个列表
	# print(header_row)
	# enumerate() 获取每个元素的索引及其值
	for index, column_header in enumerate(header_row):
		print(index, column_header)

	# 第二列每天的最高温度
	dates, highs, lows = [], [], []
	for row in reader:
		try:
			current_date = datetime.strptime(row[0], '%Y-%m-%d')
			high = int(row[1])
			low = int(row[3])
		except ValueError:
			print(current_date, '错误数据')
		else:
			dates.append(current_date)
			highs.append(high)
			lows.append(low)

	# 根据数据绘制图形
	fig = plt.figure(dpi=128, figsize=(10, 6))
	plt.plot(dates, highs, c='red')
	plt.plot(dates, lows, c='blue')

	# facecolor指定填充区域的颜色
	plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
	# 设置图形的格式
	title = "Daily high and low temperatures - 2014\nDeath Valley CA"
	plt.title(title, fontsize=20)
	plt.xlabel("", fontsize=16)
	# 绘制斜的日期标签
	fig.autofmt_xdate()
	plt.ylabel("Temperature (F)", fontsize=16)
	plt.tick_params(axis='both', which='major', labelsize=16)

	plt.show()

提取文件的数据生成可视化图表:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a3GCNhgQ-1637590864053)(/upload/2020/10/image-da41db533c474bf4b2b6b9f790785d3e.png)]

death_valley_2014.csv 这个文件中有一行数据是2014-2-16,,,,,,,,,,,,,,,,,,,0.00,,,-1
如果不进行错误检查会报错, 代码中我们使用try-except-else来处理了这种问题:

try:
    current_date = datetime.strptime(row[0], '%Y-%m-%d')
    high = int(row[1])
    low = int(row[3])
except ValueError:
    print(current_date, '错误数据')
else:
    dates.append(current_date)
    highs.append(high)
    lows.append(low)

只会在控制台打印2014-02-16 00:00:00 错误数据, 不影响可视化图表的生成.

参考

代码:
highs_lows.py
highs_lows2.py
highs_lows3.py

文件:
sitka_weather_07-2014.csv
sitka_weather_2014.csv
death_valley_2014.csv

Python3 目录

逆流者blog

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
14. Python3 使用csv模块处理CSV(逗号分割的值)格式存储的天气数据

CSV文件格式CSV文件格式是逗号分隔值（Comma-Separated Values，CSV，有时也称为字符分隔值，因为分隔字符也可以不是逗号），其文件以纯文本形式存储表格数据（数字和文本）.datetime模块因为csv格式文件中含有日期格式, 我们使用datetime模块来解析.datetime模块中strptime函数可根据接受的实参规则来解析日期from datetime import datetimefirst_date = datetime.strptime('2020-10-2
复制链接

扫一扫

专栏目录