fundamental of data science DATA2001

jia V iuww520

于 2024-08-28 10:10:36 发布

阅读量111

点赞数 2

文章标签：数据分析

本文链接：https://blog.csdn.net/w976j_/article/details/141633716

版权

Dataset:

The dataset for this assignment is provided in blackboard. The dataset contains results from the chemical analysis of different wines. These wines are grown in the same region in Italy but by 3 different cultivators. The analysis determined the quantity of 13 components found in each of the wine samples. The dataset has 178 samples and 14 attributes.

iuww520iuww520iuww520iuww520iuww520iuww520iuww520iuww520iuww520

Wine (3 different cultivators of wine are represented by the three integers: 1 to 3).
Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
OD280/OD315 of diluted wines
Proline

More information on dataset can be accessed from here: Wine - UCI Machine Learning Repository . Note: Different versions of this dataset that can be found online should not be used for this assignment.

The submitted notebook should address 6 tasks (see marking grid for mark allocation):

Data Preparation: Read the dataset using the “pandas” library. Can you identify the missing data both row- and column-wise in the dataset? Handle data quality issues you found in an appropriate way. Explain how you did it along with the reasons of your choice.

Exploratory Data Analysis (EDA): Perform a detailed univariate and bivariate EDA on the columns in the dataset. Produce plots and report your observation for each plot clearly. In case the given dataset has many attributes, you can focus on performing EDA and reporting on just the most important attributes.

Find the mean and standard deviation for each type of component for each cultivator of wine and report your findings in a table. Comment on apparent differences between the cultivators of wine (i.e., vignerons).

Find the correlation among the numerical columns for each cultivator. Produce visualisations for the correlations and explain the observed results.

Perform k-means clustering on the data. Comment on the number of clusters chosen, on possible limitations, and on any form of uncertainty about the results. Are the results in agreement with what you observed in the EDA?

Perform principal component analysis on the data. Comment on the results, plot the percentage of variance explained by each principal component. Also plot the principal components which you think are of interest, report your observations and limitations.

Note: The submitted Jupyter notebook should be commented properly and written in a way that makes it easy for the reader to understand. For marking purpose, your code may be rerun to verify the results.

jia V iuww520

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
fundamental of data science DATA2001

Dataset:The dataset for this assignment is provided in blackboard. The dataset contains results from the chemical analysis of different wines. These wines are grown in the same region in Italy but by 3 different cultivators. The analysis determined the qua
复制链接

扫一扫