聚类有如下特征:
如果列标为,
1,PREDICT,预测可选;
2, INPUT,预测不可选;
3,PREDICT ONLY,TRAINING忽略。
一共有四种算法:
1,SCALEABLE EM
2, NO SCALEABLE EM
3, SCALEABLE KM
4, NO SCALEABLE KM
下面一个例子比较四种算法
create mining structure [Clustering Method]
(
[Age] long discretized(automatic,10),
[Bike Buyer] long discrete,
[Commute Distance] text discrete,
[Customer Key] long key,
[Education] text discrete,
[Gender] text discrete,
[House Owner Flag] text discrete,
[Marital Status] text discrete,
[Number Cars Owned] long discrete,
[Number Children At Home] long discrete,
[Occupation] text discrete,
[Region] text discrete,
[Total Children] long discrete,
[Yearly Income] double continuous
)
alter mining structure [Clustering Method]
add mining model [Clutering_SEM]
using microsoft_clustering
(CLUSTERING