有关糖尿病模型建立的论文_预测糖尿病结果的模型比较

本文探讨了糖尿病预测模型的构建,通过比较不同的机器学习方法,如深度学习和传统算法,来评估其在预测糖尿病结果上的效能。利用Python和TensorFlow等工具,研究旨在找出最有效的预测模型。
摘要由CSDN通过智能技术生成

有关糖尿病模型建立的论文

项目主题 (Subject of the Project)

The dataset is primarily used for predicting the onset of diabetes within five years in females of Pima Indian heritage over the age of 21 given medical details about their bodies. The dataset is meant to correspond with a binary (2-class) classification machine learning problem.

该数据集主要用于预测21岁以上的皮马印度裔女性在五年内的糖尿病发作情况,并提供有关其身体的医学详细信息。 该数据集旨在与二进制(2类)分类机器学习问题相对应。

We have a dependent variable that indicates the state of having diabetes. Our goal is to model the relationship between other variables and whether or not they have diabetes.

我们有一个因变量,表明患有糖尿病的状态。 我们的目标是为其他变量与他们是否患有糖尿病之间的关系建模。

When the various features of the people are entered, we want to establish a machine learning model that will make a prediction about whether these people will have diabetes or not. This is a classification problem.

当人们的各种特征被输入时,我们想建立一个机器学习模型,以预测这些人是否患有糖尿病。 这是分类问题。

数据集信息 (Dataset Information)

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

该数据集最初来自美国国立糖尿病与消化及肾脏疾病研究所。 数据集的目的是基于数据集中包含的某些诊断测量值来诊断预测患者是否患有糖尿病。 从较大的数据库中选择这些实例受到一些限制。 特别是,这里的所有患者均为皮马印第安人血统至少21岁的女性。

We have 9 columns and 768 instances (rows). The column names are provided as follows:

我们有9列和768个实例(行)。 提供的列名称如下:

- Pregnancies: Number of times pregnant
- Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- BloodPressure: Diastolic blood pressure (mm Hg)
- SkinThickness: Triceps skinfold thickness (mm)
- Insulin: 2-Hour serum insulin measurement (mu U/ml)
- BMI: Body mass index (weight in kg/(height in m) 2 )
- DiabetesPedigreeFunction: Diabetes pedigree function
- Age: Age (years)
- Outcome: Class variable (0 or 1, 0 = non-diabetic, 1 = diabetic)

数据理解 (Data Understanding)

Image for post
#installation of librariesimport numpy as np
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import confusion_matrix, accuracy_score, mean_squared_error, r2_score, roc_auc_score, roc_curve, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import KFold
#any warnings that do not significantly impact the project are ignoredimport warnings
warnings.simplefilter(action = "ignore")
#reading the datasetdf = pd.read_csv("diabetes.csv")
#selection of the first 5 observationsdf.head()
png
#return a random sample of items from an axis of objectdf.sample(3)
png
#makes random selection from dataset at the rate of written valuedf.sample(frac = 0.01)
png
#size informationdf.shape(768, 9)#dataframe's index dtype and column dtypes, non-null values and memory usage informationdf.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB#explanatory statistics values of the observation units corresponding to the specified percentagesdf.describe([0.10,0.25,0.50,0.75,0.90,0.95,0.99]).T#transposition of the df table. This makes it easier to evaluate.
png
#correlation between variablesdf
  • 0
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值