Datacamp 笔记&代码 Unsupervised Learning in Python 第三章 Decorrelating your data and dimension reduction

最新推荐文章于 2020-09-18 23:13:59 发布

JinnyR

最新推荐文章于 2020-09-18 23:13:59 发布

阅读量560

点赞数

分类专栏： datacamp 文章标签： datacamp sklearn

本文链接：https://blog.csdn.net/u011292816/article/details/97866711

版权

更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python

Datacamp track: Data Scientist with Python - Course 23 (3)

Exercise

Correlated data in nature

You are given an array grains giving the width and length of samples of grain. You suspect that width and length will be correlated. To confirm this, make a scatter plot of width vs length and measure their Pearson correlation.

Instruction

Import:
- matplotlib.pyplot as plt.
- pearsonr from scipy.stats.
Assign column 0 of grains to width and column 1 of grains to length.
Make a scatter plot with width on the x-axis and lengthon the y-axis.
Use the pearsonr() function to calculate the Pearson correlation of width and length.

import pandas as pd

grains = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2141/datasets/seeds-width-vs-length.csv', header=None).values

# Perform the necessary imports
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Assign the 0th column of grains: width
width = grains[:,0]

# Assign the 1st column of grains: length
length = grains[:,1]

# Scatter plot width vs length
plt.scatter(width, length)
plt.axis('equal')
plt.show()

# Calculate the Pearson correlation
correlation, pvalue = pearsonr(width, length)

# Display the correlation
print(correlation)

[外链图片转存失败(img-fyXaqT3p-1564520846485)(output_2_0.png)]

0.8604149377143467

Exercise

Decorrelating the grain measurements with PCA

You observed in the previous exercise that the width and length measurements of the grain are correlated. Now, you’ll use PCA to decorrelate these measurements, then plot the decorrelated points and measure their Pearson correlation.

Instruction

Import PCA from sklearn.decomposition.
Create an instance of PCA called model.
Use the .fit_transform() method of model to apply the PCA transformation to grains. Assign the result to pca_features.
The subsequent code to extract, plot, and compute the Pearson correlation of the first two columns pca_features has been written for you, so hit ‘Submit Answer’ to see the result!

# Import PCA
from sklearn.decomposition import PCA

# Create PCA instance: model
model = PCA()

# Apply the fit_transform method of model to grains: pca_features
pca_features = model.fit_transform(grains)

# Assign 0th column of pca_features: xs
xs = pca_features[:,0]

# Assign 1st column of pca_features: ys
ys = pca_features[:,1]

# Scatter plot xs vs ys
plt.scatter(xs, ys)
plt.axis('equal')
plt.show()

# Calculate the Pearson correlation of xs and ys
correlation, pvalue = pearsonr(xs, ys)

# Display the correlation
print(correlation)

最低0.47元/天解锁文章

JinnyR

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Datacamp 笔记&代码 Unsupervised Learning in Python 第三章 Decorrelating your data and dimension reduction

更多原始数据文档和JupyterNotebookGithub: https://github.com/JinnyR/Datacamp_DataScienceTrack_PythonDatacamp track: Data Scientist with Python - Course 23 (3)ExerciseCorrelated data in natureYou are given ...
复制链接

扫一扫

专栏目录