ml-100k推荐数据_ML-了解统计数据

在机器学习项目中,了解数据和统计至关重要。本文详细介绍了如何通过Python和Pandas来分析数据,包括查看原始数据、检查数据尺寸、获取属性数据类型、数据统计汇总、复习类分布、复查属性间相关性和属性分布的偏差。这些步骤帮助提升预处理和模型训练的质量。
摘要由CSDN通过智能技术生成
ml-100k推荐数据

ml-100k推荐数据

ML-了解统计数据 (ML - Understanding Data with Statistics)

介绍 (Introduction)

While working with machine learning projects, usually we ignore two most important parts called mathematics and data. It is because, we know that ML is a data driven approach and our ML model will produce only as good or as bad results as the data we provided to it.

在进行机器学习项目时,通常我们会忽略两个最重要的部分,称为数学数据 。 这是因为,我们知道ML是一种数据驱动的方法,而我们的ML模型只会产生与提供给它的数据一样好的或坏的结果。

In the previous chapter, we discussed how we can upload CSV data into our ML project, but it would be good to understand the data before uploading it. We can understand the data by two ways, with statistics and with visualization.

在上一章中,我们讨论了如何将CSV数据上传到ML项目中,但是最好在上传之前了解数据。 我们可以通过统计和可视化两种方式来理解数据。

In this chapter, with the help of following Python recipes, we are going to understand ML data with statistics.

在本章中,在遵循以下Python食谱的帮助下,我们将了解具有统计信息的ML数据。

查看原始数据 (Looking at Raw Data)

The very first recipe is for looking at your raw data. It is important to look at raw data because the insight we will get after looking at raw data will boost our chances to better pre-processing as well as handling of data for ML projects.

第一个配方是查看原始数据。 查看原始数据很重要,因为在查看原始数据后将获得的洞察力将增加我们为ML项目更好地进行预处理以及处理数据的机会。

Following is a Python script implemented by using head() function of Pandas DataFrame on Pima Indians diabetes dataset to look at the first 50 rows to get better understanding of it −

以下是通过使用Pima Indians糖尿病数据集上的Pandas DataFrame的head()函数实现的Python脚本,以查看前50行以更好地理解它-

(Example)


from pandas import read_csv
path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names=headernames)
print(data.head(50))

输出量 (Output)


preg   plas  pres    skin  test  mass   pedi    age      class
0      6      148     72     35   0     33.6    0.627    50    1
1      1       85     66     29   0     26.6    0.351    31    0
2      8      183     64      0   0     23.3    0.672    32    1
3      1       89     66     23  94     28.1    0.167    21    0
4      0      137     40     35  168    43.1    2.288    33    1
5      5      116     74      0   0     25.6    0.201    30    0
6      3       78     50     32   88    31.0    0.248    26    1
7     10      115      0      0   0     35.3    0.134    29    0
8      2      197     70     45  543    30.5    0.158    53    1
9      8      125     96      0   0     0.0     0.232    54    1
10     4      110     92      0   0     37.6    0.191    30    0
11    10      168     74      0   0     38.0    0.537    34    1
12    10      139     80      0   0     27.1    1.441    57    0
13     1      189     60     23  846    30.1    0.398    59    1
14     5      166     72     19  175    25.8    0.587    51    1
15     7      100      0      0   0     30.0    0.484    32    1
16     0 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值