基于空气污染数据实现危害因素分析

最新推荐文章于 2024-03-24 10:50:29 发布

weixin_61813371

最新推荐文章于 2024-03-24 10:50:29 发布

阅读量73

点赞数

文章标签： 1024程序员节

本文链接：https://blog.csdn.net/weixin_61813371/article/details/134022348

版权

一、项目背景

近年来，随着社会经济发展和人们生活水平的提高，环境污染越来越受到人们的重视。空气污染是人类健康面临的最大环境威胁之一，与气候变化并列。据估计，每年因暴露于空气污染可造成700万人过早死亡，并导致损失数百万健康寿命年。对此，如何分析空气污染的相关危害因素，分析危害人体健康的是哪种空气指标，已成为了当下相关部分所面临的难题。

二、项目需求

1、分析因空气污染死亡人数情况，挖掘哪种空气污染更危险。

2、结合国家、地区的政治、经济等因素，尝试发现因空气污染死亡的原因。

3、预测未来因空气污染死亡的总人数，提出改善空气质量意见。

三、项目实现

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

1、数据探索

# 读入数据
air = pd.read_excel('death-rates-from-air-pollution.xlsx')
air.head()

# 数据基本信息
air.info()

# 修改列名
air = air.rename(
    columns={
        'Air pollution (total) (deaths per 100,000)': 'Air pollution (total)',
        'Indoor air pollution (deaths per 100,000)': 'Indoor air pollution',
        'Outdoor particulate matter (deaths per 100,000)':
        'Outdoor particulate matter',
        'Outdoor ozone pollution (deaths per 100,000)':
        'Outdoor ozone pollution'
    })
air.head()

# 是否有空值
air.isna().sum()

# 每个国家地区的年份数据是否齐全
df = pd.DataFrame(air['Entity'].value_counts())
df[df['Entity'] != 28]

ce = air[air['Code'].isna()]['Entity'].unique()
print('国家地区Code为空值数量：', len(ce))
print(ce)

2、分析

2.1、各种空气污染的致死总人数情况

air_pol = air.iloc[:, 3:].sum()

x = air_pol.index
y = air_pol

bar = plt.bar(x, y)
plt.bar_label(bar, label_type='edge')
plt.title('各种空气污染的致死总人数情况')
plt.xticks(rotation=-20)
plt.show()

从图可得知，Air pollution (total)的死亡人数最多

2.2、每年各种空气污染致死的总人数情况

air_year = air.groupby('Year')[[
    'Air pollution (total)', 'Indoor air pollution',
    'Outdoor particulate matter', 'Outdoor ozone pollution'
]].sum()


x = np.arange(len(air_year))
air_pollution_categories = ['Air pollution (total)', 'Indoor air pollution', 'Outdoor particulate matter', 'Outdoor ozone pollution']
colors = ['-or', '-Dc', '-^g', '-dy']

fig, axes = plt.subplots(2, 2, figsize=(18, 10))

for i, category in enumerate(air_pollution_categories):
    row = i // 2
    col = i % 2

    y = air_year[category]
    axes[row, col].plot(x, y, colors[i])
    axes[row, col].set_title(f'{category}每年致死人数')
    axes[row, col].set_xticks(x, air_year.index, rotation=-50)

plt.tight_layout()
plt.show()

从以上四幅图得知，随着时间的推移，各种情况的死亡人数总体呈下降趋势

2.3、各个地区空气污染致死人数情况

air_Entity = air.groupby('Entity')[[
    'Air pollution (total)', 'Indoor air pollution',
    'Outdoor particulate matter', 'Outdoor ozone pollution'
]].sum()

ls_air = [
    'Air pollution (total)', 'Indoor air pollution',
    'Outdoor particulate matter', 'Outdoor ozone pollution'
]
color = ['c', 'g', 'b', 'r']
x = np.arange(len(air_Entity))

fig, axes = plt.subplots(4, 1, figsize=(18, 38))

for i in range(4):
    y = air_Entity[ls_air[i]]
    axes[i].bar(x, y, color=color[i])
    axes[i].set_title(f'{ls_air[i]}各地区致死人数')
    axes[i].set_xticks(x, air_Entity.index, rotation=90)

plt.tight_layout()
plt.show()

2.4、臭氧、颗粒物、室内污染致死的主要集中在哪些地区TOP10

ls = [
    'Outdoor ozone pollution', 'Outdoor particulate matter',
    'Indoor air pollution'
]
color = ['r', 'b', 'g']
pol = ['臭氧', '颗粒物', '室内']
xs = np.arange(10)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

for i in range(3):
    y = air_Entity[ls[i]].sort_values(ascending=False).head(10)
    bar = axes[i].bar(xs, y, color=color[i])
    axes[i].bar_label(bar, label_type='edge')
    axes[i].set_title(f'{pol[i]}污染致死人数TOP10的地区')
    axes[i].set_xticks(xs, y.index, rotation=-80)

plt.tight_layout()
plt.show()

这三种情况的死亡人数排在前10的地区都属于发展和国家和落后地区，这说明空气污染情况跟地区的环境、经济和政治有联系。

3、预测未来死亡总人数

3.1、构建特征

采用滑窗法构建特征预测未来因空气污染致死的总人数

ls_x1 = []
ls_x2 = []
ls_x3 = []
ls_y = []
for i in range(len(air_year) - 3):
    ls_x1.append(air_year.iloc[i, 0])
    ls_x2.append(air_year.iloc[i + 1, 0])
    ls_x3.append(air_year.iloc[i + 2, 0])
    ls_y.append(air_year.iloc[i + 3, 0])

data = {'x1': ls_x1, 'x2': ls_x2, 'x3': ls_x3, 'y': ls_y}
df = pd.DataFrame(data)
df

3.2、模型训练测试

# 划分数据
train = df.iloc[:20, :]
test_x, test_y = df.iloc[20:, 0:3], df.iloc[20:, 3]
train = train.sample(frac=1).reset_index(drop=True)  # 打乱数据
train_x, train_y = train.iloc[:, 0:3], train.iloc[:, 3]

from sklearn.linear_model import LinearRegression # 选取线性回归模型

line = LinearRegression()
line.fit(train_x, train_y)
pred = line.predict(test_x)

from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error # 模型评价指标

print(r2_score(test_y, pred))
print(mean_absolute_error(test_y, pred))
print(mean_squared_error(test_y, pred))

3.3、预测

同样采用滑动法预测未来年份的值；将预测得到的值加入到与已知值同个列表中作为特征值继续预测新值

# 预测2018-2027年的空气污染致死人数
feature_x = list(air_year['Air pollution (total)'][-3:].values)
feature_y = []
for i in range(10):
    feature = np.array(feature_x).reshape(1, -1)
    pred_y = line.predict(feature)[0]
    feature_y.append(pred_y)
    feature_x.append(pred_y)
    feature_x = feature_x[-3:]

pred_fea = {'year': np.arange(2018,2028), 'pred_value': feature_y }
pred_fea = pd.DataFrame(pred_fea)
pred_fea

预测得到的值呈现趋势跟已知值呈现的趋势一样，都是随着年份的推移，死亡人数随之下降。

weixin_61813371

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
基于空气污染数据实现危害因素分析

近年来，随着社会经济发展和人们生活水平的提高，环境污染越来越受到人们的重视。空气污染是人类健康面临的最大环境威胁之一，与气候变化并列。据估计，每年因暴露于空气污染可造成700万人过早死亡，并导致损失数百万健康寿命年。对此，如何分析空气污染的相关危害因素，分析危害人体健康的是哪种空气指标，已成为了当下相关部分所面临的难题。
复制链接

扫一扫