红酒数据集分析（纯数字数据集）

最新推荐文章于 2024-07-09 20:20:53 发布

guaixi

最新推荐文章于 2024-07-09 20:20:53 发布

阅读量8.3k

点赞数 13

分类专栏：编程入门文章标签： python 机器学习数据分析

本文链接：https://blog.csdn.net/guaixi/article/details/110632137

版权

红酒数据集数据分析

导入相关包
导入数据及总览
单变量分析
- 处理红酒的酸度特征
- 处理甜度特征
双变量分析
多变量分析
- pH，非挥发性酸，和柠檬酸

目标：了解影响红酒品质的主要理化因素

导入相关包

#import相关的库
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid") #这是seaborn默认的风格
pd.set_option('precision',3) #设置数据精度

导入数据及总览


df = pd.read_csv("datasets/wine/winequality-red.csv",sep = ';')
df.head()
fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.998	3.51	0.56	9.4	5
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.997	3.20	0.68	9.8	5
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.997	3.26	0.65	9.8	5
3	11.20.28	0.56	1.9	0.075	17.0	60.0	0.998	3.16	0.58	9.8	6
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.998	3.51	0.56	9.4	5

各指标介绍
-fixed acidity	         固定酸度
-volatile acidity	     挥发性酸度
-citric acid	         柠檬酸
-residual sugar	         残留糖
-chlorides	             氯化物
-free sulfur dioxide	 游离二氧化硫
-total sulfur dioxide	 总二氧化硫
-density	             密度
-pH	                     酸碱度
-sulphates	             硫酸盐
-alcohol	             酒精度
-quality                 品质

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB
查看红酒品质分布
df["quality"].value_counts()

5    681
6    638
7