uci数据集中的缺失数据_从uci早期糖尿病风险预测数据集中创建分类器

uci数据集中的缺失数据

To begin we must first go and download the dataset from the UCI dataset repository. The link for the dataset can be found below.

首先,我们必须首先从UCI数据集存储库下载数据集。 数据集的链接可以在下面找到。

https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.

https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset

After downloading the dataset, as long as it is not too big, I like to look at it in a spreadsheet to get a sense of what I am working with.

下载数据集后,只要它不是太大,我就喜欢在电子表格中查看它,以了解自己正在使用什么。

Image for post

As you can see we have 17 total variables with what appears as binary record values for each field except for ‘Age’. From here we’ll open the dataset in a notebook environment to explore it more. For this project, I used Google Colab which is based on a Jupyter notebook environment and does not require any configuration before using.

如您所见,我们共有17个变量,每个变量的字段都显示为二进制记录值(“年龄”除外)。 从这里,我们将在笔记本环境中打开数据集以进行更多研究。 对于这个项目,我使用了基于Jupyter笔记本环境的Google Colab,并且在使用之前不需要任何配置。

There are a few ways to pull data into Google Colab from a personal location of yours. For this project, I ran the following command which allows you to browse your local computer for a file to upload.

有几种方法可以将数据从您的个人位置提取到Google Colab中。 对于此项目,我运行了以下命令,该命令可让您浏览本地计算机以查找要上传的文件。

Image for post

From there we’ll load in some necessary libraries.

从那里我们将加载一些必要的库。

Image for post

The next step is to read in the data to a DataFrame and to explore the variables to see if we will need to do any data imputation.

下一步是将数据读入DataFrame并探究变量,以查看是否需要进行任何数据插补。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值