有关糖尿病模型建立的论文_预测糖尿病结果的模型比较

最新推荐文章于 2024-06-25 22:41:49 发布

weixin_26752765

最新推荐文章于 2024-06-25 22:41:49 发布

阅读量1.8k

点赞数

文章标签：机器学习 python 人工智能深度学习 tensorflow

原文链接：https://medium.com/swlh/model-comparison-for-predicting-diabetes-outcomes-ddcd06384743

版权

本文探讨了糖尿病预测模型的构建，通过比较不同的机器学习方法，如深度学习和传统算法，来评估其在预测糖尿病结果上的效能。利用Python和TensorFlow等工具，研究旨在找出最有效的预测模型。

摘要由CSDN通过智能技术生成

有关糖尿病模型建立的论文

项目主题 (Subject of the Project)

The dataset is primarily used for predicting the onset of diabetes within five years in females of Pima Indian heritage over the age of 21 given medical details about their bodies. The dataset is meant to correspond with a binary (2-class) classification machine learning problem.

该数据集主要用于预测21岁以上的皮马印度裔女性在五年内的糖尿病发作情况，并提供有关其身体的医学详细信息。该数据集旨在与二进制(2类)分类机器学习问题相对应。

We have a dependent variable that indicates the state of having diabetes. Our goal is to model the relationship between other variables and whether or not they have diabetes.

我们有一个因变量，表明患有糖尿病的状态。我们的目标是为其他变量与他们是否患有糖尿病之间的关系建模。

When the various features of the people are entered, we want to establish a machine learning model that will make a prediction about whether these people will have diabetes or not. This is a classification problem.

当人们的各种特征被输入时，我们想建立一个机器学习模型，以预测这些人是否患有糖尿病。这是分类问题。

数据集信息 (Dataset Information)

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

该数据集最初来自美国国立糖尿病与消化及肾脏疾病研究所。数据集的目的是基于数据集中包含的某些诊断测量值来诊断预测患者是否患有糖尿病。从较大的数据库中选择这些实例受到一些限制。特别是，这里的所有患者均为皮马印第安人血统至少21岁的女性。

We have 9 columns and 768 instances (rows). The column names are provided as follows:

我们有9列和768个实例(行)。提供的列名称如下：

- Pregnancies: Number of times pregnant
- Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- BloodPressure: Diastolic blood pressure (mm Hg)
- SkinThickness: Triceps skinfold thickness (mm)
- Insulin: 2-Hour serum insulin measurement (mu U/ml)
- BMI: Body mass index (weight in kg/(height in m) 2 )
- DiabetesPedigreeFunction: Diabetes pedigree function
- Age: Age (years)
- Outcome: Class variable (0 or 1, 0 = non-diabetic, 1 = diabetic)

数据理解 (Data Understanding)

#installation of librariesimport numpy as np
import pandas as pd 
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import confusion_matrix, accuracy_score, mean_squared_error, r2_score, roc_auc_score, roc_curve, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import KFold#any warnings that do not significantly impact the project are ignoredimport warnings
warnings.simplefilter(action = "ignore")#reading the datasetdf = pd.read_csv("diabetes.csv")
#selection of the first 5 observationsdf.head()

#return a random sample of items from an axis of objectdf.sample(3)

#makes random selection from dataset at the rate of written valuedf.sample(frac = 0.01)

#size informationdf.shape(768, 9)#dataframe's index dtype and column dtypes, non-null values and memory usage informationdf.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB#explanatory statistics values of the observation units corresponding to the specified percentagesdf.describe([0.10,0.25,0.50,0.75,0.90,0.95,0.99]).T#transposition of the df table. This makes it easier to evaluate.

#correlation between variablesdf

最低0.47元/天解锁文章

weixin_26752765

关注

0
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
有关糖尿病模型建立的论文_预测糖尿病结果的模型比较

项目主题 (Subject of the Project)The dataset is primarily used for predicting the onset of diabetes within five years in females of Pima Indian heritage over the age of 21 given medical details about th...
复制链接

扫一扫