机器学习：心血管疾病数据分析

最新推荐文章于 2022-12-01 11:36:41 发布

文章全靠抄

最新推荐文章于 2022-12-01 11:36:41 发布

阅读量6.7k

点赞数 4

文章标签：机器学习 Pandas seaborn

本文链接：https://blog.csdn.net/weixin_42406148/article/details/90444796

版权

本文介绍了如何运用Python的Pandas和seaborn库对一个包含大约8万条数据的心血管疾病数据集进行分析。通过代码注释详细展示了数据筛选、频率统计、百分比计算、平均值和中位数分析、数据清理、皮尔逊相关性分析以及绘制小提琴图和直方图等统计图表的过程。

摘要由CSDN通过智能技术生成

2019-5-22
python3.6
所有包为5月15日之前的最新包

Pandas，seaborn 的一些图表操作

数据集特征
在这里插入图片描述

大概有8W条数据左右。

对表的操作以及解决的问题都在代码中进行了注释

主要的操作有，筛选数据，频率，百分比，平均数，中位数，数据清除，皮尔逊相关性系数矩阵，小提琴图，直方统计分布图等

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker
from matplotlib import rcParams
import warnings
warnings.filterwarnings('ignore')

pd.set_option('expand_frame_repr', True)   # true表示可以换行显示
pd.set_option('display.max_columns', None)
#显示所有行
pd.set_option('display.max_rows', None)
pd.set_option('max_colwidth', 100)

sns.set()
sns.set_context(
    "notebook",
    font_scale=1.5,
    rc={
        "figure.figsize": (11, 8),
        "axes.titlesize": 18
    }
)
rcParams['figure.figsize'] = 11, 8

# https://labfile.oss.aliyuncs.com/courses/1283/telecom_churn.cs
df = pd.read_csv(
    'D:/pycharm_pro/imageinfo/CVD_analysis/mlbootcamp5_train.csv', sep=';')
print(df.head())

# 数据集中有多少男性和女性？由于 gender 特征没有说明男女，你需要通过分析身高计算得出。
dup = df.groupby('gender').size()
bodyhigh = df.groupby('gender')['height'].mean()
# print(bodyhigh)
# print(dup)

# 数据集中男性和女性，哪个群体饮酒的频次更高？
rate = df.groupby('gender')['alco'].mean()
# print(rate)


# 数据集中男性和女性吸烟者所占百分比的差值