1.数据获取与处理
1)数据来源于kaggle,已免费上传到个人主页,可自取。
2)数据样式
df.head(5)
Out[119]:
HeartDisease BMI Smoking ... KidneyDisease SkinCancer target
0 No 16.60 Yes ... No Yes 0
1 No 20.34 No ... No No 0
2 No 26.58 Yes ... No No 0
3 No 24.21 No ... No Yes 0
4 No 23.71 No ... No No 0
[5 rows x 19 columns]
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 319795 entries, 0 to 319794
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 HeartDisease 319795 non-null object
1 BMI 319795 non-null float64
2 Smoking 319795 non-null object
3 AlcoholDrinking 319795 non-null object
4 Stroke 319795 non-null object
5 PhysicalHealth 319795 non-null float64
6 MentalHealth 319795 non-null float64
7 DiffWalking 319795 non-null object
8 Sex 319795 non-null object
9 AgeCategory 319795 non-null object
10 Race 319795 non-null object
11 Diabetic 319795 non-null object
12 PhysicalActivity 319795 non-null object
13 GenHealth 319795 non-null object
14 SleepTime 319795 non-null float64
15 Asthma 319795 non-null object
16 KidneyDisease 319795 non-null object
17 SkinCancer 319795 non-null object
18 target 319795 non-null int64
dtypes: float64(4), int64(1), object(14)
memory usage: 46.4+ MB
3)数据预处理
df['target']=df['HeartDis