房屋损坏率预测-回归分析

 台风灾害期间房屋损坏率预测。该数据集来源于荷兰红十字会提供的510全球数据库,包括过去二十年来菲律宾发生的12次典型台风的数据,数据见文件all.csv。以下是这些台风的名称:“Bopha”,“Goni”, “Hagupit”, “Haima”, “Haiyan”, Kalmaegi”, “Koppu”, “Melor”, “Nock-Ten”, “Rammasun”, “Sarika”和“Utor”。台风灾害数据包括1638次观测值。
数据特征选择过程中,它的前期主观筛选,可参考该网址

#导入库
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
%matplotlib inline
plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
inputfile=r'../data/All.csv'
outputfile=r"../data/process.csv"

 定义绘图方法

#定义批量绘图,用于查看数据分布和检验异常值
#直方图
def outliers_hist_all(num,df,cols=5,save=False,save_path=None):
    fig=plt.figure(figsize=(16,16))
    rows=int(np.ceil(num/cols))
    for i in range(num):
        ax=fig.add_subplot(rows,cols,i+1)
        ax.hist(df[df.columns[i]],bins=30)
        ax.set_title(df.columns[i])
    if save==True:
        plt.savefig(save_path)
    plt.show()
    
#散点图
def outliers_scatter(num,df,cols=3,save=False,save_path=None):
    import seaborn as sns
    fig=plt.figure(figsize=(16,16))
    rows=int(np.ceil(num/cols))
    for i in range(num):
        ax=fig.add_subplot(rows,cols,i+1)
        sns.scatterplot(x=df[df.columns[i]],y=data['Total damaged houses (rel.)'],ax=ax)
    if save==True:
        plt.savefig(save_path)
    plt.show()

1. 数据预处理模块

#缺失值检验
# print("缺失值处理前:\n",data.isna().sum())
data.dropna(inplace=True,subset=['% skilled Agriculture/Forestry/Fishermen','Experience'])
# print("缺失值处理后:\n",data.isna().sum())
#选择属性
data=data[['Total damaged houses (rel.)','Wind speed','Distance to typhoon','rainfall','distance_first_impact',
           'Experience','Elevation','Slope','slope_stdev','Population density','Poverty incidence','% skilled Agriculture/Forestry/Fishermen',
           '% strong roof type','% strong wall type']]
#直方图检验
#outliers_hist_all(num=14,df=data)
#散点图
# outliers_scatter(14,data)
#将房屋受损率大于100%设定上限为99.99%
data['Total damaged houses (rel.)'][data['Total damaged houses (rel.)']>100]=99.99
print(max(data['Total damaged houses (rel.)'])) #检验是否删除成功
#将房屋损坏率转换为0-1
data['Total damaged houses (rel.)']=data['Total damaged houses (rel.)']/100

2. 相关性检验

num_corr=data.corr()
num_corr['Total damaged houses (rel.)'].sort_values()
import seaborn as sns
fig,ax=plt.subplots(figsize=(8,8))
sns.heatmap(num_corr,annot=True,ax=ax)

23

Distance to typhoon                        -0.421213
% strong wall type                         -0.273373
distance_first_impact                      -0.236768
% strong roof type                         -0.234977
Experience                                 -0.227065
Elevation                                  -0.185689
Slope                                      -0.082292
Population density                         -0.075493
slope_stdev                                -0.011632
% skilled Agriculture/Forestry/Fishermen    0.027598
rainfall                                    0.105437
Poverty incidence                           0.254854
Wind speed                                  0.693841
Total damaged houses (rel.)                 1.000000
Name: Total damaged houses (rel.), dtype: float64

3.数据转换对比

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值