Algorithm-part3-week2

MST Review
Input: Undirected graph G = (V, E),edges costs ce c e .
Output: Min-cost spanning tree(no cycles, connected).
Assumptions : G is connected, distinct edge costs.
Cut Property: If e is the cheapest edge crossing some cut(A, B), then e belongs to the MST.
Kruskal’s MST Algorithm
Sort edges in order of increasing cost.
[Rename edges 1,2,…,m so that c1<c2<...<cm c 1 < c 2 < . . . < c m .]
T= − T = ∅
For i=1 to m m O(m) time
If T{i} T ∪ { i } has no cycles. O(n) time to check for cycle.
[ Use BFS or DFS in the graph(V,T)which contains n-1 edges.

− − − Add i i to T.

Return T.
Running time of straightforward implementation:
(m = # of edges, n= # of vertices] O(mlogn)+O(mn) = O(mn).
Plan: Data structure for O(1)-time cycle checks
O(mlogn) O ( m l o g n ) time.

The Union-Find Data structure.
Maintain partition of a set of objects.
FIND(X): Return name of group that X belongs to.
UNION( Ci C i , Cj C j ): Fuse groups Ci,CJ C i , C J into a single one.
Objects = vertices
Groups = Connected components chosen edges T.
Adding new edge(u,v) to T.
Motivation: O(1)-time cycle checks in Kruskal’s algorithm.
Idea#1: -Maintain one linked structure per connected component of(V,T).
Each component has an arbitrary leader vertex.
Invariant: Each vertex points to the leaders of its component.
Key point: Given edge(u,v), can check if u&v already in same component in O(1) time.
[if and only if leader pointers of u, v match,i,e.,FIND(u) = FIND(v) ] O(1)-time cycle checks!

Runing Time of Fast Implementation
Scored:
O(mlogn) time for sorting .
O(m) times for cycle checks[O(1) per iteration]
O(nlog(n)) time overall for leader pointer updates.

total (Matching Prim’s algorithm)

Clustering [unsupervised learning]
Informal goal: Given n “points”,[Web pages, images, genome fragments,etc] classfify into “coherent groups”;
Assumptions:
(1) As input, given a (dis) similarity measure
a distance d(p,q) d ( p , q ) between each point pair.
(2)Symmetric [d(p,q) = d(q,p)]
Examples: Euclidean distance, genome similarity, etc.
Goal: Same cluster “nearby”
Max-Spacing k-Clusterings
Assume: We know k:= k := # of clusters desired.
[In practice, can experiment with a range of values]
Call points p & q separated is they’re assigned to different clusters.
Definition: The spacing of a k-clustering is minseparatedp,q m i n s e p a r a t e d p , q d(p,q) d ( p , q ) .
Problem statement: Given a distance measure d d and k, compute the kclustering k − c l u s t e r i n g with maximum spacing.
A Greedy Algorithm
Initially, each point in a sepatate cluster
Repeat until only k clusters.
− − Let p,q= p , q = closest pair of separated points.
(determines the current spacing)
Merge the cluster containg p & q into a single cluster.
Note: Just like Krusskal’s MST algorithm, but stopped early.
Points vertices, distances edges costs, point pairs. edges.
Called single-link clustering.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值