Week8_1Unsupervised Learning
第 1 题
For which of the following tasks might K-means clustering be a suitable algorithm? Select all that apply.
- Given a database of information about your users, automatically group them into different market segments.
- Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.
- Given historical weather records, predict the amount of rainfall tomorrow (this would be a real-valued output)
- Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.
* 答案: 1 2 K-Means用来分类的,不能做预测 *
* 选项1: 把客户按市场分类,可以作到. 正确 *
* 选项2: 给出超市商品的销售数据,把卖的最快的商品与卖的不快的商品进行分类. 正确 *
* 选项3: 给出天气的历史数据,预测明天的天气,不能进行预测,做不到. 不正确 *
* 选项4: 给出超市商品的销售数据, 预测未来的销量,做不到. 不正确 *
- Given a set of news articles from many different news websites, find out what are the main topics covered.
- Given many emails, you want to determine if they are Spam or Non-Spam emails.
- From the user usage patterns on a website, figure out what different groups of users exist.
- Given historical weather records, predict if tomorrow’s weather will be sunny or rainy.
* 答案: 1 3 K-Means用来分类的,不能做预测 *
第 2 题
Suppose we have three cluster centroids μ1=[12] μ 1 = [ 1 2 ] μ2=[−30] μ 2 = [ − 3 0 ] and μ3=[42] μ 3 = [ 4 2 ] . Furthermore, we have a training example x(i)=[−21] x ( i ) = [ − 2 1 ] . After a cluster assignment step, what will c(i) c ( i ) be?
- c(i)=2 c ( i ) = 2
- c(i) c ( i ) is not assigned
- c(i)=1 c ( i ) = 1
- c(i)=3 c ( i ) = 3
* 答案: 1 计算
xi
x
i
与每个
ci
c
i
的距离=
||x(i)−μc(i)||2
|
|
x
(
i
)
−
μ
c
(
i
)
|
|
2
,取最小的 *
* 与
μ1
μ
1
的矩离=
[−2−(1)]2+[1−2]2=9+1=10
[
−
2
−
(
1
)
]
2
+
[
1
−
2
]
2
=
9
+
1
=
10
*
* 与
μ2
μ
2
的矩离=
[−2−(−3)]2+[1−0]2=1+1=2
[
−
2
−
(
−
3
)
]
2
+
[
1
−
0
]
2
=
1
+
1
=
2
*
* 与
μ3
μ
3
的矩离=
[−2−(4)]2+[1−2]2=36+1=37
[
−
2
−
(
4
)
]
2
+
[
1
−
2
]
2
=
36
+
1
=
37
*
* 所以与
μ2
μ
2
的矩离最小 *
第 3 题
K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
- Move the cluster centroids, where the centroids μk μ k are updated.
- The cluster assignment step, where the parameters c(i) c ( i ) are updated.
- Move each cluster centroid μk μ k , by setting it to be equal to the closest training example x(i) x ( i )
- The cluster centroid assignment step, where each cluster centroid μi μ i is assigned (by setting c(i) c ( i ) ) to the closest training example x(i) x ( i ) .
* 答案: 1 2 4 –> 这个答案不正确 *
- Move the cluster centroids, where the centroids μk μ k are updated.
- The cluster centroid assignment step, where each cluster centroid μi μ i is assigned (by setting c(i) c ( i ) ) to the closest training example x(i) x ( i ) .
- The cluster assignment step, where the parameters c(i) c ( i ) are updated.
- Move each cluster centroid
μk
μ
k
, by setting it to be equal to the closest training example
x(i)
x
(
i
)
* 答案: 1 3 正确 *
第 4 题
Suppose you have an unlabeled dataset
{x(1),…,x(m)}
{
x
(
1
)
,
…
,
x
(
m
)
}
. You run K-means with 50 different random
initializations, and obtain 50 different clusterings of the data. What is the recommended way for choosing which one of
these 50 clusterings to use?
- The only way to do so is if we also have labels y(i) y ( i ) for our data.
- For each of the clusterings, compute 1m∑mi=1||x(i)−μc(i)||2 1 m ∑ i = 1 m | | x ( i ) − μ c ( i ) | | 2 , and pick the one that minimizes this.
- The answer is ambiguous, and there is no good way of choosing.
- Always pick the final (50th) clustering found, since by that time it is more likely to have converged to a good solution.
* 答案: 2 *
* 代价函数最小的才行 *
第 5 题
Which of the following statements are true? Select all that apply.
- If we are worried about K-means getting stuck in bad local optima, one way to ameliorate (reduce) this problem is if we try using multiple random initializations.
- The standard way of initializing K-means is setting μ1=⋯=μk to be equal to a vector of zeros.
- Since K-Means is an unsupervised learning algorithm, it cannot overfit the data, and thus it is always better to have as large a number of clusters as is computationally feasible.
- For some datasets, the “right” or “correct” value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide.
* 答案: 1 4 *
* 选项1: *
- K-Means will always give the same results regardless of the initialization of the centroids.
- A good way to initialize K-means is to select K (distinct) examples from the training set and set the cluster centroids equal to these selected examples.
- On every iteration of K-means, the cost function J(c(1),…,c(m),μ1,…,μk) J ( c ( 1 ) , … , c ( m ) , μ 1 , … , μ k ) (the distortion function) should either stay the same or decrease; in particular, it should not increase.
- Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid.
* 答案: 2 3 *
* 选项1: 跟初始值有很大关系. 不正确 *
* 选项2: 正确 *
* 选项3: 正确 *