In the paper above, Ng's group uses K-means for dictionary learning; that is, to learn small image patches such that a sparse set of these images can reconstruct any image patch from the data. An extremely successful approach to this is sparse coding, which constructs vocabulary patches such that each data patch requires only the activation of a small number of component words.
K-means normally assigns each data point its closest word (centroid), while in soft k-means each data point is at least a little related to every word but heavily related to a very few. Tri k-means is a successful heuristic to assign each data point to a very few words, like sparse coding. This requires that each word/centroid is an important component in reconstructing a good handful of data points, but doesn't mind if its completely useless to construct far-away data. This is essentially the same as finding a good regularizer (meeting the goldilocks criterion: not too strong, not too weak) on the vocabulary.
tri-k-means中,0有一半的比例,这就类似于sparse coding,剩下的一半占据重要位置,也有点类似于PCA中的降维,提取某些重要元素。