21. Unsupervised Learning in-R

文章目录

1. Unsupervised Learning in R

1.1 Welcome to the course (video)

1.2 Identify clustering problems

1.3 Introduction to k-means clustering (video)

1.4 k-means clustering

Instruction:

# Create the k-means model: km.out
km.out <- kmeans(x, centers = 3, nstart = 20)

# Inspect the result
summary(km.out)

1.5 Results of kmeans()

Instruction:

# Print the cluster membership component of the model
km.out

# Print the km.out object
print(km.out)

1.6 Visualizing and interpreting results of kmeans()

Instruction:

# Scatter plot of x
plot(x,col = km.out$cluster, 
     main = "k-means with 3 clusters",
     xlab = "", 
     ylab = "") 

1.7 How kmeans() works and practical matters (video)

1.8 Handling random algorithms

Instruction:

# Set up 2 x 3 plotting grid
par(mfrow = c(2, 3))

# Set seed
set.seed(1)

for(i in 1:6) {
  # Run kmeans() on x with three clusters and one start
  km.out <- kmeans(x, center = 3, nstart = 1)
  
  # Plot clusters
  plot(x, col = km.out$cluster, 
       main = km.out$tot.withinss, 
       xlab = "", ylab = "")
}

1.9 Selecting number of clusters

Instruction:

# Initialize total within sum of squares error: wss
wss <- 0

# For 1 to 15 cluster centers
for (i in 1:15) {
  km.out <- kmeans(x, centers = i, nstart = 20)
  # Save total within sum of squares to wss variable
  wss[i] <- km.out$tot.withinss
}

# Plot total within sum of squares vs. number of clusters
plot(1:15, wss, type = "b", 
     xlab = "Number of Clusters", 
     ylab = "Within groups sum of squares")

# Set k equal to the number of clusters corresponding to the elbow location
k <- 2  # 3 is probably OK, too

1.10 Introduction to the Pokemon data (video)

1.11 Practical matters: working with real data

Instruction:

# Initialize total within sum of squares error: wss
wss <- 0

# Look over 1 to 15 possible clusters
for (i in 1:15) {
  # Fit the model: km.out
  km.out <- kmeans(pokemon, centers = i, nstart = 20, iter.max = 50)
  # Save the within cluster sum of squares
  wss[i] <- wss[i] <- km.out$tot.withinss
}

# Produce a scree plot
plot(1:15, wss, type = "b", 
     xlab = "Number of Clusters", 
     ylab = "Within groups sum of squares")

# Select number of clusters
k <- 2

# Build model with k clusters: km.out
km.out <- kmeans(pokemon, centers = 2, nstart = 20, iter.max = 50)

# View the resulting model
km.out

# Plot of Defense vs. Speed by cluster membership
plot(pokemon[, c("Defense", "Speed")],
     col = km.out$cluster,
     main = paste("k-means clustering of Pokemon with", k, "clusters"),
     xlab = "Defense", ylab = "Speed")

1.12 Review of k-means clustering (video)

2. Hierarchical Clustering

2.1 Introduction to hierarchical clustering (video)

2.2 Hierarchical clustering with results

Instruction:

# Create hierarchical clustering model: hclust.out
hclust.out <- hclust(dist(x))

# Inspect the result
summary(hclust.out)

2.3 Selecting number of clusters (video)

2.4 Interpreting dendrogram

2.5 Cutting the tree

Instruction:

# Cut by height
cutree(hclust.out, h = 7)


# Cut by number of clusters
cutree(hclust.out, k = 3)

2.6 Clustering linkage and practical matters (video)

2.7 Linkage method

Instruction:

# Cluster using complete linkage: hclust.complete
hclust.complete <- hclust(dist(x), method = "complete")

# Cluster using average linkage: hclust.average
hclust.average <- hclust(dist(x), method = "average")

# Cluster using single linkage: hclust.single
hclust.single <- hclust(dist(x), method = "single")

# Plot dendrogram of hclust.complete
plot(hclust.complete, main = "Complete")

# Plot dendrogram of hclust.average
plot(hclust.average, main = "Average")

# Plot dendrogram of hclust.single
plot(hclust.single, main = "Single")

2.8 Comparing linkage methods

2.9 Practical matters: scaling

Instruction:

# View column means
colMeans(pokemon)

# View column standard deviations
apply(pokemon, 2, sd)

# Scale the data
pokemon.scaled <- scale(pokemon)

# Create hierarchical clustering model: hclust.pokemon
hclust.pokemon <- hclust(dist(pokemon.scaled), method = "complete")

2.10 Comparing kmeans () and hclust ()

Instruction:

# Apply cutree() to hclust.pokemon: cut.pokemon
cut.pokemon <- cutree(hclust.pokemon, k = 3)

# Compare methods
table(km.pokemon$cluster, cut.pokemon)

2.11 Review of hierarchical clustering (video)

3. Dimensionality Reduction with PCA

3.1 Introduction to PCA

3.2 PCA using prcomp ()

Instruction:

# Perform scaled PCA: pr.out
pr.out <- prcomp(x = pokemon, scale = T, center = T)

# Inspect model output
summary(pr.out)

3.3 Results of PCA

3.4 Additional results of PCA

3.5 Visualizing and interpreting PCA results (video)

3.6 Interpreting biplots (1)

3.7 Interpreting biplots (2)

3.8 Variance explained

Instruction:

# Variability of each principal component: pr.var
pr.var <- pr.out$sdev^2

# Variance explained by each principal component: pve
pve <- pr.var / sum(pr.var)

3.9 Visualize variance explained

Instruction:

# Plot variance explained for each principal component
plot(pve, xlab = "Principal Component",
     ylab = "Proportion of Variance Explained",
     ylim = c(0, 1), type = "b")

# Plot cumulative proportion of variance explained
plot(cumsum(pve), xlab = "Principal Component",
     ylab = "Cumulative Proportion of Variance Explained",
     ylim = c(0, 1), type = "b")

3.10 Practical issues with PCA (video)

3.11 Practical issues: scaling

Instruction:

# Mean of each variable
colMeans(pokemon)

# Standard deviation of each variable
apply(pokemon, 2, sd)

# PCA model with scaling: pr.with.scaling
pr.with.scaling <- prcomp(pokemon, scale = TRUE)

# PCA model without scaling: pr.without.scaling
pr.without.scaling <- prcomp(pokemon, scale = FALSE)

# Create biplots of both for comparison
biplot(pr.with.scaling)
biplot(pr.without.scaling)

3.12 Additional uses of PCA and wrap-up (video)

4. Putting it All Together with A Case Study

4.1 Introduction to the case study (video)

4.2 Preparing the data

Instruction:

url <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1903/datasets/WisconsinCancer.csv"

# Download the data: wisc.df
wisc.df <- read.csv(url)

# Convert the features of the data: wisc.data
wisc.data <- as.matrix(wisc.df[, 3:32])

# Set the row names of wisc.data
row.names(wisc.data) <- wisc.df$id

# Create diagnosis vector
diagnosis <- as.numeric(wisc.df$diagnosis == "M")

4.3 Exploratory data analysis

4.4 Performing PCA

Instruction:

# Check column means and standard deviations
colMeans(wisc.data)
apply(wisc.data, 2, sd)

# Execute PCA, scaling if appropriate: wisc.pr
wisc.pr <- prcomp(wisc.data, scale = T, center = T)

# Look at summary of results
summary(wisc.pr)

4.5 Interpreting PCA results

Instruction:

# Create a biplot of wisc.pr
biplot(wisc.pr)

# Scatter plot observations by components 1 and 2
plot(wisc.pr$x[, c(1, 2)], col = (diagnosis + 1), 
     xlab = "PC1", ylab = "PC2")

# Repeat for components 1 and 3
plot(wisc.pr$x[, c(1, 3)], col = (diagnosis + 1), 
     xlab = "PC1", ylab = "PC3")

# Do additional data exploration of your choosing below (optional)
plot(wisc.pr$x[, c(2, 3)], col = (diagnosis + 1), 
     xlab = "PC2", ylab = "PC3")

4.6 Variance explained

Instruction:

# Set up 1 x 2 plotting grid
par(mfrow = c(1, 2))

# Calculate variability of each component
pr.var <- wisc.pr$sdev^2

# Variance explained by each principal component: pve
pve <- pr.var / sum(pr.var)


# Plot variance explained for each principal component
plot(pve, xlab = "Principal Component", 
     ylab = "Proportion of Variance Explained", 
     ylim = c(0, 1), type = "b")

# Plot cumulative proportion of variance explained
plot(cumsum(pve), xlab = "Principal Component", 
     ylab = "Cumulative Proportion of Variance Explained", 
     ylim = c(0, 1), type = "b")

4.7 Communicating PCA results

4.8 PCA review and next steps (video)

4.9 Hierarchical clustering of case data

Instruction:

# Scale the wisc.data data: data.scaled
data.scaled <- scale(wisc.data)

# Calculate the (Euclidean) distances: data.dist
data.dist <- dist(data.scaled)

# Create a hierarchical clustering model: wisc.hclust
wisc.hclust <- hclust(data.dist, method = "complete")

4.10 Results of hierarchical clustering

4.11 Selecting number of clusters

Instruction:

# Cut tree so that it has 4 clusters: wisc.hclust.clusters
wisc.hclust.clusters <- cutree(wisc.hclust, k = 4)

# Compare cluster membership to actual diagnoses
table(wisc.hclust.clusters, diagnosis)

4.12 k-means clustering and comparing results

Instruction:

# Create a k-means model on wisc.data: wisc.km
wisc.km <- kmeans(scale(wisc.data), centers = 2, nstart = 20)

# Compare k-means to actual diagnoses
table(wisc.km$cluster, diagnosis)
sum(apply(table(wisc.km$cluster, diagnosis), 1, min))

# Compare k-means to hierarchical clustering
table(wisc.hclust.clusters, wisc.km$cluster)
sum(apply(table(wisc.hclust.clusters, wisc.km$cluster), 1, min))

4.13 Clustering on PCA results

Instruction:

# Create a hierarchical clustering model: wisc.pr.hclust
wisc.pr.hclust <- hclust(dist(wisc.pr$x[, 1:7]), method = "complete")

# Cut model into 4 clusters: wisc.pr.hclust.clusters
wisc.pr.hclust.clusters <- cutree(wisc.pr.hclust, k = 4)

# Compare to actual diagnoses
table(diagnosis, wisc.pr.hclust.clusters)

# Compare to k-means and hierarchical
table(diagnosis, wisc.hclust.clusters)
table(diagnosis, wisc.km$cluster)

4.14 Wrap-up and review

(3) 参考利用下面的程序代码,完成代码注释中要求的两项任务。 import re """ 下面ref是2020年CVPR的最佳论文的pdf格式直接另存为文本文件后, 截取的参考文献前6篇的文本部分。 请利用该科研文献的这部分文本,利用正则表达式、字符串处理等方法, 编程实现对这6篇参考文献按下面的方式进行排序输出。 a.按参考文献标题排序 b.按出版年份排序 """ ref = """[1] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3D point clouds. In Proc. ICML, 2018 [2] Pulkit Agrawal, Joao Carreira, and Jitendra Malik. Learning to see by moving. In Proc. ICCV, 2015 [3] Peter N. Belhumeur, David J. Kriegman, and Alan L. Yuille. The bas-relief ambiguity. IJCV, 1999 [4] Christoph Bregler, Aaron Hertzmann, and Henning Biermann. Recovering non-rigid 3D shape from image streams. In Proc. CVPR, 2000 [5] Angel X. Chang, Thomas Funkhouser, Leonidas Guibas. Shapenet: An information-rich 3d model reposi-tory. arXiv preprint arXiv:1512.03012, 2015 [6] Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dy-lan Drover, Rohith MV, Stefan Stojanov, and James M. Rehg. Unsupervised 3d pose estimation with geometric self-supervision. In Proc. CVPR, 2019""" ref_str = re.sub(r'\[([0-9]{1})\]', r'$[\1]', ref) # 添加分隔$ print(ref_str) #脚手架代码 ref_str_2 = re.sub(r'([a-zA-Z]{2})\.', r'\1.#', ref_str) # 添加分隔# print(ref_str_2) #脚手架代码 ref_str2 = ref_str_2.replace("\n", "") ref_list = ref_str2.split("$") print(ref_list) #脚手架代码 [提示: 排序可以采用内置函数sorted(),语法如下: sorted(iterable, /, *, key=None, reverse=False), 注意掌握形式参数中带“/”和“*”的用途]
05-26
代码注释如下: ```python import re # 定义参考文献字符串 ref = """[1] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3D point clouds. In Proc. ICML, 2018 [2] Pulkit Agrawal, Joao Carreira, and Jitendra Malik. Learning to see by moving. In Proc. ICCV, 2015 [3] Peter N. Belhumeur, David J. Kriegman, and Alan L. Yuille. The bas-relief ambiguity. IJCV, 1999 [4] Christoph Bregler, Aaron Hertzmann, and Henning Biermann. Recovering non-rigid 3D shape from image streams. In Proc. CVPR, 2000 [5] Angel X. Chang, Thomas Funkhouser, Leonidas Guibas. Shapenet: An information-rich 3d model reposi-tory. arXiv preprint arXiv:1512.03012, 2015 [6] Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dy-lan Drover, Rohith MV, Stefan Stojanov, and James M. Rehg. Unsupervised 3d pose estimation with geometric self-supervision. In Proc. CVPR, 2019""" # 用正则表达式将参考文献编号中的中括号[]改为$,以便后续分割 ref_str = re.sub(r'\[([0-9]{1})\]', r'$[\1]', ref) # 用正则表达式将参考文献中的作者缩写后面的句号.改为#,以便后续分割 ref_str_2 = re.sub(r'([a-zA-Z]{2})\.', r'\1.#', ref_str) # 去掉参考文献中的换行符 ref_str2 = ref_str_2.replace("\n", "") # 用$分割参考文献字符串,将每个参考文献作为一个元素存入列表中 ref_list = ref_str2.split("$") # 按参考文献标题排序 ref_list_title = sorted(ref_list, key=lambda x: x.split(". ")[-1]) # 打印按参考文献标题排序后的结果 print("按参考文献标题排序:") for i in ref_list_title: print(i) # 按出版年份排序 ref_list_year = sorted(ref_list, key=lambda x: int(re.findall(r'\d{4}', x)[-1])) # 打印按出版年份排序后的结果 print("按出版年份排序:") for i in ref_list_year: print(i) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值