ABSTRACT:
抽象:
I am creating a Data Analysis Project on Heart Disease Prediction. The project uses raw data in form of a .csv file and transforms into Data Analysis. This project is an attempt of data analyzing Heart Disease Prediction with the help of data science and data analytics in python code. Heart disease is one of the biggest causes of morbidity and mortality among the population of the world. Prediction of cardiovascular disease is regarded as one of the most important subjects in the section of clinical data analysis. The amount of data in the healthcare industry is huge. Data mining turns the large collection of raw healthcare data into information that can help to make informed decisions and predictions.
我正在创建有关心脏病预测的数据分析项目。 该项目使用.csv文件形式的原始数据,然后转换为数据分析。 该项目是在数据科学和python代码中的数据分析的帮助下对心脏病预测进行数据分析的尝试。 心脏病是世界人口发病率和死亡率的最大原因之一。 在临床数据分析部分,心血管疾病的预测被认为是最重要的主题之一。 医疗保健行业中的数据量巨大。 数据挖掘将大量原始医疗保健数据收集转变为有助于做出明智决策和预测的信息。
Coronary Heart Disease (CHD) is the most common type of heart disease, killing over 370,000 people annually. Every year about 735,000 Americans has a heart attack. Of these, 525,000 are a first heart attack and 210,000 happen in people who have already had a heart attack. This makes heart disease a major concern to be dealt with. But it is difficult to identify heart disease because of several risk factors such as diabetes, high blood pressure, high cholesterol, abnormal pulse rate, and many other factors. Because of these factors, scientists have turned towards modern approaches like Data Mining and Machine Learning for predicting the disease.
冠心病(CHD)是最常见的心脏病,每年造成370,000多人死亡。 每年约有735,000美国人患有心脏病。 其中,有525,000例是首次心脏病发作,而210,000例是已经患有心脏病的人。 这使心脏病成为需要处理的主要问题。 但是由于多种危险因素(例如糖尿病,高血压,高胆固醇,脉搏异常)和许多其他因素,很难识别出心脏病。 由于这些因素,科学家们已转向数据挖掘和机器学习等现代方法来预测疾病。
In this article, I will be applying Data analytics as well as one Machine Learning approach for classifying whether a person is suffering from heart disease or not, using one of the most used dataset — the Cleveland Heart Disease dataset from the UCI Repository.
在本文中,我将使用数据分析以及一种机器学习方法,使用最常用的数据集之一(来自UCI存储库的克利夫兰心脏病数据集)对人是否患有心脏病进行分类。
导入库: (Importing Libraries:)
import numpy as np:
将numpy导入为np:
NumPy is a python library used for working with arrays. It also has functions for working in the domain of linear algebra, Fourier transform, and matrices. It was created in 2005 by Travis Oliphant. It is an open-source project and you can use it freely. NumPy stands for Numerical Python. In Python, we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. Arrays are very frequently used in data science, where speed and resources are very important.
NumPy是用于处理数组的python库。 它还具有用于线性代数,傅立叶变换和矩阵领域的功能。 它由Travis Oliphant于2005年创建。 这是一个开源项目,您可以自由使用它。 NumPy代表数值Python。 在Python中,我们有满足数组目的的列表,但是处理起来很慢。 NumPy旨在提供一个比传统Python列表快50倍的数组对象。 阵列在数据科学中非常常用,因为速度和资源非常重要。
import pandas as pd
将熊猫作为pd导入
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, statistics, analytics, etc. Python was majorly used for data munging and preparation. It had very little contribution to data analysis. Pandas solved this problem. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.
Pandas是经过BSD许可的开源Python库,为Python编程语言提供了高性能,易于使用的数据结构和数据分析工具。 带有Pandas的Python被广泛用于包括学术,商业领域在内的众多领域,包括金融,经济学,统计学,分析等。Python主要用于数据处理和准备。 它对数据分析的贡献很小。 熊猫解决了这个问题。 使用Pandas,无论数据的来源如何,我们都可以完成五个典型的数据处理和分析步骤-加载,准备,操作,建模和分析。
import matplotlib.pyplot as plt
导入matplotlib.pyplot作为plt
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. Pyplot is a Matplotlib module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the advantage of being free and open-source. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the advantage of being free and open-source. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.
Matplotlib是Python编程语言及其数字数学扩展NumPy的绘图库。 Pyplot是Matplotlib模块,提供类似于MATLAB的界面。 Matplotlib设计为与MATLAB一样可用,具有使用Python的能力以及免费和开源的优势。 Matplotlib设计为与MATLAB一样可用,具有使用Python的能力以及免费和开源的优势。 它提供了一个面向对象的API,可使用Tkinter,wxPython,Qt或GTK +等通用GUI工具包将绘图嵌入到应用程序中。
import seaborn as sns
将seaborn导入为sns
Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib and closely integrated with pandas data structures. Seaborn aims to make visualization a central part of exploring and understanding data. Its dataset-oriented plotting functions operate on data frames and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots.
Seaborn是一个使用Python制作统计图形的库。 它建立在matplotlib之上,并与pandas数据结构紧密集成。 Seaborn旨在使可视化成为探索和理解数据的中心部分。 其面向数据集的绘图功能在包含整个数据集的数据框和数组上运行,并在内部执行必要的语义映射和统计汇总以生成信息图。
import os
导入操作系统
It is possible to automatically perform many operating system tasks. The OS module in Python provides functions for creating and removing a directory (folder), fetching its contents, changing and identifying the current directory, etc.
可以自动执行许多操作系统任务。 Python中的OS模块提供了用于创建和删除目录(文件夹),获取其内容,更改和标识当前目录等功能。
import warnings
进口警告
Warning messages are typically issued in situations where it is useful to alert the user of some condition in a program, where that condition (normally) doesn’t warrant raising an exception and terminating the program. For example, one might want to issue a warning when a program uses an obsolete module. Warning messages are normally written to sys.stderr, but their disposition can be changed flexibly, from ignoring all warnings to turning them into exceptions. The disposition of warnings can vary based on the warning category, the text of the warning message, and the source location where it is issued. Repetitions of a particular warning for the same source location are typically suppressed.
警告消息通常是在警告用户程序中某些情况的情况下发出的,该情况(通常)不保证引发异常并终止程序。 例如,当程序使用过时的模块时,可能要发出警告。 警告消息通常被写入sys.stderr,但是可以灵活地更改它们的处理方式,从忽略所有警告到将其转变为异常。 警告的处理方式可能会根据警告类别,
这是一个关于使用Python和数据分析技术预测心脏病的数据分析项目。文章介绍了如何利用数据科学和机器学习方法,如数据预处理、EDA和Logistic回归,对心脏病进行预测。通过分析,发现Logistic回归模型的预测准确率为85.25%。
最低0.47元/天 解锁文章
3962

被折叠的 条评论
为什么被折叠?



