红酒数据探索(第三方库NumPy)

红酒数据探索(第三方库NumPy)

1、读取数据并打印数据集前5行记录(第一行为列名)

import numpy as np
path = 'D:xxxxx\wine quality red.csv'
wines1=np.genfromtxt(path,delimiter=",",skip_header=1)
wines1[:5]

array([[7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02, 1.100e+01,
3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00, 5.000e+00],
[7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02, 2.500e+01,
6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00, 5.000e+00],
[7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02, 1.500e+01,
5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00, 5.000e+00],
[1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02, 1.700e+01,
6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00, 6.000e+00],
[7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02, 1.100e+01,
3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00, 5.000e+00]])

2、打印数据集内,品质 “quality” 变量的总共的品质等级

qualities = np.unique(wines1[1:,11],axis=0)
print(qualities)

[3. 4. 5. 6. 7. 8.]

3、打印品质 “quality” 变量为3的子集的所有样本列表

wines1[np.where(wines1[:,11]==3)]

array([[1.1600e+01, 5.8000e-01, 6.6000e-01, 2.2000e+00, 7.4000e-02,
1.0000e+01, 4.7000e+01, 1.0008e+00, 3.2500e+00, 5.7000e-01,
9.0000e+00, 3.0000e+00],
[1.0400e+01, 6.1000e-01, 4.9000e-01, 2.1000e+00, 2.0000e-01,
5.0000e+00, 1.6000e+01, 9.9940e-01, 3.1600e+00, 6.3000e-01,
8.4000e+00, 3.0000e+00],
[7.4000e+00, 1.1850e+00, 0.0000e+00, 4.2500e+00, 9.7000e-02,
5.0000e+00, 1.4000e+01, 9.9660e-01, 3.6300e+00, 5.4000e-01,
1.0700e+01, 3.0000e+00],
[1.0400e+01, 4.4000e-01, 4.2000e-01, 1.5000e+00, 1.4500e-01,
3.4000e+01, 4.8000e+01, 9.9832e-01, 3.3800e+00, 8.6000e-01,
9.9000e+00, 3.0000e+00],
[8.3000e+00, 1.0200e+00, 2.0000e-02, 3.4000e+00, 8.4000e-02,
6.0000e+00, 1.1000e+01, 9.9892e-01, 3.4800e+00, 4.9000e-01,
1.1000e+01, 3.0000e+00],
[7.6000e+00, 1.5800e+00, 0.0000e+00, 2.1000e+00, 1.3700e-01,
5.0000e+00, 9.0000e+00, 9.9476e-01, 3.5000e+00, 4.0000e-01,
1.0900e+01, 3.0000e+00],
[6.8000e+00, 8.1500e-01, 0.0000e+00, 1.2000e+00, 2.6700e-01,
1.6000e+01, 2.9000e+01, 9.9471e-01, 3.3200e+00, 5.1000e-01,
9.8000e+00, 3.0000e+00],
[7.3000e+00, 9.8000e-01, 5.0000e-02, 2.1000e+00, 6.1000e-02,
2.0000e+01, 4.9000e+01, 9.9705e-01, 3.3100e+00, 5.5000e-01,
9.7000e+00, 3.0000e+00],
[7.1000e+00, 8.7500e-01, 5.0000e-02, 5.7000e+00, 8.2000e-02,
3.0000e+00, 1.4000e+01, 9.9808e-01, 3.4000e+00, 5.2000e-01,
1.0200e+01, 3.0000e+00],
[6.7000e+00, 7.6000e-01, 2.0000e-02, 1.8000e+00, 7.8000e-02,
6.0000e+00, 1.2000e+01, 9.9600e-01, 3.5500e+00, 6.3000e-01,
9.9500e+00, 3.0000e+00]])

4、统计并打印每个品质下的样本数

a = wines1[1:,11]
unique, counts = np.unique(a,return_counts=True)
dict(zip(unique,counts))

{3.0: 10, 4.0: 53, 5.0: 680, 6.0: 638, 7.0: 199, 8.0: 18}

5、计算并打印每个品质下变量fixed acidity的均值

wines2 = []
for i in qualities:
    data2 = i,wines1[np.where(wines1[:,11]==i)][:,0].mean()
    wines2.append(data2)
print(wines2)

[(3.0, 8.36), (4.0, 7.779245283018868), (5.0, 8.167254038179149), (6.0, 8.34717868338558), (7.0, 8.872361809045225), (8.0, 8.566666666666666)]

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Yung-hsin

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值