白葡萄酒项目探索研究
代码如下:
import csv
f = open("C:\用户\Shanks\white_wine.csv",'r',encoding="utf-8")
reader = csv.reader(f)
data = []
for row in reader:
data.append(row)
for i in range(5):
print(data[i])
f.close()
import csv
f = open("white_wine.csv",'r')
reader = csv.reader(f)
data = []
for row in reader:
data.append(row)
quality_list = []
for row in data[1:]:
quality_list.append(int(row[ -1]))
quality_count = set(quality_list)
print("白葡萄酒共有%s种等级, 分别为:%r"
%(len(quality_count), quality_count))
import csv
f = open("white_wine.csv",'r')
reader = csv.reader(f)
data = []
for row in reader:
data.append(row)
content_dict = {}
for row in data[1:]:
quality = int(row[-1])
if quality not in content_dict.keys():
content_dict[quality] = [row]
else:
content_dict[quality].append(row)
for key in content_dict:
print('等级为%d, 数量为%d' %(key, len(content_dict[key])))
f.close()
import csv
import numpy as np
import matplotlib.pyplot as plt
f = open("white_wine.csv",'r')
reader = csv.reader(f)
data = []
for row in reader:
data.append(row)
content_dict = {}
for row in data[1:]:
quality = int(row[-1])
if quality not in content_dict.keys():
content_dict[quality] = [row]
else:
content_dict[quality].append(row)
x = []
y = []
for key in content_dict:
x.append(key)
y.append(len(content_dict[key]))
plt.bar(x, y)
plt.show()
import csv
f = open("white_wine.csv",'r')
reader = csv.reader(f)
data = []
for row in reader:
data.append(row)
content_dict = {}
for row in data[1:]:
quality = int(row[-1])
if quality not in content_dict.keys():
content_dict[quality] = [row]
else:
content_dict[quality].append(row)
mean_list = []
for key,value in content_dict.items():
sum = 0
for row in value:
sum += float(row[0])
mean_list.append((key, sum / len(value)))
for item in mean_list:
print(item[0],",", item[1])
结果如下:
白葡萄酒共有7种等级, 分别为:{3, 4, 5, 6, 7, 8, 9}
等级为6, 数量为1539
等级为5, 数量为1020
等级为7, 数量为616
等级为8, 数量为123
等级为4, 数量为115
等级为3, 数量为14
等级为9, 数量为4
6 : 6.812085769980511
5 : 6.907843137254891
7 : 6.755844155844158
8 : 6.708130081300811
4 : 7.052173913043476
3 : 7.535714285714286
9 : 7.5
学习笔记:该数据集的数据形式如下:
首先,我们需要将存储在本地的数据集white_wine.csv读取入内存中。说明:引入csv模块,打开文件将数据保存于列表content中打印content前5行查看白葡萄酒中总共分为几个品质品质quality变量在数据中是一个离散变量,而不是连续的,所以它只会有固定的几个等级。那么我们用Python中自带的集合set来查看白葡萄酒中总共的品质等级
说明:使用集合set查看白葡萄酒总共分为几个品质,并将所有品质等级保存在集合unity_quality中其中,品质等级数据在最后一列按白葡萄酒等级将数据集划分为7个子集,将数据按白葡萄酒等级quality进行切分为7个子集,保存到一个字典中,字典的键为quality具体数值,值为归属于该quality的样本列表
说明:按白葡萄酒等级将数据集划分为7个子集,用字典保存每个子集,字典变量名为content_dict,变量的关键词key为品质,值value为每个品质子集的数据列表。