回归分析预测
As per the Centers for Disease Control and Prevention report, heart disease is the prime killer of both men and women in the United States and around the globe. There are several data mining techniques that can be leveraged by researchers/ statisticians to help health care professionals determine heart disease and its potential causes. Some of the significant risk factors associated with heart disease are age, blood pressure, total cholesterol, diabetes, hypertension, family history of heart disease, obesity, lack of physical exercise, etc.
根据美国疾病控制与预防中心的报告,心脏病是美国乃至全球男女的主要杀手。 研究人员/统计人员可以利用多种数据挖掘技术来帮助医疗保健专业人员确定心脏病及其潜在原因。 与心脏病有关的一些重要危险因素是年龄,血压,总胆固醇,糖尿病,高血压,心脏病家族史,肥胖症,缺乏体育锻炼等。
In this project from Data Camp, the objective of my project is to build a regression model and run statistical tests to assess how strongly are the clinical factors associated with heart disease and how it is related to the higher probability of getting a heart disease. I shall be implementing Multiple and Logistic Regression approaches together with data explorations in ggplot and dplyr. This project uses the Cleveland heart disease dataset.
在这个来自Data Camp的项目中,我的目标是建立一个回归模型并运行统计测试,以评估与心脏病相关的临床因素有多强烈,以及它与患心脏病可能性更高的相关性。 我将在ggplot和dplyr中实现多元和逻辑回归方法以及数据探索。 该项目使用克利夫兰心脏病数据集。
Here’s a glimpse of the dataset in hand -
这是现有数据集的一瞥-
数据字典 (Data Dictionary)
There are 14 columns in the dataset which are set out as mentioned below -
数据集中有14列,其内容如下所述-
a. Age : It is a continuous data type which describes the age of the person in years.
一个。 年龄 :这是一个连续的数据类型,描述了人的年龄(以年为单位)。
b. Sex: It is a discrete data type that describes the gender of the person. Here 0 = Female and 1 = Male
b。 性别:这是描述人的性别的离散数据类型。 0 =女性,1 =男性
c. CP(Chest Pain type): It is a discrete data type that describes the chest pain type with following parameters- 1 = Typical angina; 2 = Atypical angina; 3 = Non-anginal pain ; 4 = Asymptotic
C。 CP(Chest Pain type) :这是一种离散数据类型,描述了具有以下参数的胸痛类型-1 =典型心绞痛; 2 =非典型心绞痛; 3 =非心绞痛; 4 =渐近的
d. Trestbps : It is a continuous data type which describes resting blood pressure in mm Hg
d。 Trestbps:这是一个连续数据类型,以mm Hg表示静息血压
e. Cholesterol: It is a continuous data type that describes the serum cholesterol in mg/dl
e。 胆固醇:这是一个连续的数据类型,以mg / dl的形式描述血清胆固醇
f.
本文介绍了如何运用回归分析来预测心脏病。通过机器学习中的逻辑回归算法,结合Python编程,对数据进行挖掘和分析,以实现对心脏病的预测。
最低0.47元/天 解锁文章
3962

被折叠的 条评论
为什么被折叠?



