红酒数据集数据分析
目标:了解影响红酒品质的主要理化因素
导入相关包
#import相关的库
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid") #这是seaborn默认的风格
pd.set_option('precision',3) #设置数据精度
导入数据及总览
df = pd.read_csv("datasets/wine/winequality-red.csv",sep = ';')
df.head()
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.998 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.997 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.997 3.26 0.65 9.8 5
3 11.20.28 0.56 1.9 0.075 17.0 60.0 0.998 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.998 3.51 0.56 9.4 5
各指标介绍
-fixed acidity 固定酸度
-volatile acidity 挥发性酸度
-citric acid 柠檬酸
-residual sugar 残留糖
-chlorides 氯化物
-free sulfur dioxide 游离二氧化硫
-total sulfur dioxide 总二氧化硫
-density 密度
-pH 酸碱度
-sulphates 硫酸盐
-alcohol 酒精度
-quality 品质
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fixed acidity 1599 non-null float64
1 volatile acidity 1599 non-null float64
2 citric acid 1599 non-null float64
3 residual sugar 1599 non-null float64
4 chlorides 1599 non-null float64
5 free sulfur dioxide 1599 non-null float64
6 total sulfur dioxide 1599 non-null float64
7 density 1599 non-null float64
8 pH 1599 non-null float64
9 sulphates 1599 non-null float64
10 alcohol 1599 non-null float64
11 quality 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB
查看红酒品质分布
df["quality"].value_counts()
5 681
6 638
7