预测脑中风

Mu-Xing

已于 2022-09-08 08:52:59 修改

阅读量930

点赞数 2

分类专栏： # 机器学习文章标签： python 人工智能机器学习

于 2022-08-24 19:17:24 首次发布

本文链接：https://blog.csdn.net/Mu_X_ing/article/details/126511062

版权

本文探讨中风作为中国首要健康威胁，通过分析5000多例数据，识别影响中风的关键因素，建立数学模型预测中风可能性，并针对高风险小群体进行特征分析。

摘要由CSDN通过智能技术生成

中风是危害人们健康的重大疾病之一，根据医学杂志《柳叶刀》发表的一篇文章中的数据，中风在我国居民死亡原因中高居第一位，而且我国居民的中风的比例也是世界上最高的之一。因此，了解居民的身体状况与出现中风的联系，进而分析中风的成因，实现对不同的人群，有针对性的采取预防的措施，是有积极的意义的。

附件1中随机搜集了5000多个人的数据，请你根据附件1中的信息，作以下分析：

数据中列举的各因素中，哪些对中风的影响较大？给出你的排序，并分析各因素的不同取值与中风发病率的关系。
利用这些数据，寻找出几个最容易出现中风的小的群体，(一般人数不超过总人数的5%), 分析这些群体的个体特征。
建立数学模型，对不同的人分析其中风的可能性大小。并利用你的模型对附件2中的10个人按中风可能性大小排序。

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

%cd /content/drive/My Drive
!pwd

/content/drive/My Drive
/content/drive/My Drive

import matplotlib.font_manager as mfm
font_path="/content/drive/My Drive/Colab Notebooks/simhei.ttf"
prop=mfm.FontProperties(fname=font_path)

%cd ./Colab Notebooks/XXXXX
!pwd

/content/drive/My Drive/Colab Notebooks/XXXXX
/content/drive/My Drive/Colab Notebooks/XXXXX

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

data=pd.read_csv('附件1.csv')#读取
excel2=pd.read_excel('附件2.xlsx')

backup=data.copy()#备份
excel1=data
print('查看病例数据的大小:',excel1.shape)
length_excel1=excel1.shape[0]

查看病例数据的大小: (5110, 12)

print('查看病例数据的变量信息:')#观察有多少缺少值
print(excel1.info())

查看病例数据的变量信息:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 5110 non-null   int64  
 1   gender             5110 non-null   object 
 2   age                5110 non-null   float64
 3   hypertension       5110 non-null   int64  
 4   heart_disease      5110 non-null   int64  
 5   ever_married       5110 non-null   object 
 6   work_type          5110 non-null   object 
 7   Residence_type     5110 non-null   object 
 8   avg_glucose_level  5110 non-null   float64
 9   bmi                4909 non-null   float64
 10  smoking_status     5110 non-null   object 
 11  stroke             5110 non-null   int64  
dtypes: float64(3), int64(4), object(5)
memory usage: 479.2+ KB
None

print("每个病例数据的内容分布")#观察每项数据有多少个
print(excel1['gender'].value_counts(),'\n\n')
print(excel1['age'].value_counts(),'\n\n')
print(excel1['hypertension'].value_counts(),'\n\n')
print(excel1['heart_disease'].value_counts(),'\n\n')
print(excel1['ever_married'].value_counts(),'\n\n')
print(excel1['work_type'].value_counts(),'\n\n')
print(excel1['Residence_type'].value_counts(),'\n\n')
print(excel1['avg_glucose_level'].value_counts(),'\n\n')
print(excel1['bmi'].value_counts(),'\n\n')
print(excel1[&#