1.LSI的缺点:缺少令人满意的统计学基础。The rationale is that documents which share frequently co-occurring terms will have a similar representation in the latent space, even if they have no terms in common.
2.PLSI优点:建立在相似度规则、且建立了数据的产生式模型。可以充分运用统计学的理论,便于模型调节(model fitting),模型合并( model combination),and 复杂度控制(complexity control).并且,因子表示方法也可以解决一词多义的问题。
3.PLSA的核心建立在“方面模型( Aspect model)”基础之上。