词袋 高频词 文本分类
by gk_
由gk_
使用词袋法进行文本分类和预测 (Text classification and prediction using the Bag Of Words approach)
There are a number of approaches to text classification. In other articles I’ve covered Multinomial Naive Bayes and Neural Networks.
有多种文本分类方法。 在其他文章中,我介绍了多项朴素贝叶斯和神经网络 。
One of the simplest and most common approaches is called “Bag of Words.” It has been used by commercial analytics products including Clarabridge, Radian6, and others.
最简单,最常见的方法之一就是“单词袋”。 商业分析产品(包括Clarabridge , Radian6等)已使用它。
The approach is relatively simple: given a set of topics and a set of terms associated with each topic, determine which topic(s) exist within a document (for example, a sentence).
该方法相对简单:给定一组主题和与每个主题相关联的一组术语,确定文档(例如,句子)中存在哪些主题。
While other, more exotic algorithms also organize words into “bags,” in this technique we don’t create a model or apply mathematics to the way in which this “bag” intersects with a classified document. A document’s classification will be polymorphic, as it can be associated with multiple topics.
虽然其他更奇特的算法也将单词组织到“袋”中,但是在这种技术中,我们不会创建模型或将数学应用于“袋”与机密文档相交的方式。 文档的分类将是多态的,因为它可以与多个主题相关联。
Does this seem