Bag of words model (词袋模型)

   The bag-of-words model is a simplifying assumption used in natural language processing and information retrieval. In this model, a text (such as a sentence or a document) is represented as an unordered collection of words, disregarding grammar and even word order.

   词袋模型是在自然语言处理和信息检索中的一种简单假设。在这种模型中,文本(段落或者文档)被看作是无序的词汇集合,忽略语法甚至是单词的顺序。


   The bag-of-words model is used in some methods of document classification. When a Naive Bayes classifier is applied to text, for example, the conditional independence assumption leads to the bag-of-words model. [1] Other methods of document classification that use this model are latent Dirichlet allocation and latent semantic analysis.[2]

   词袋模型被用在文本分类的一些方法当中。当传统的贝叶斯分类被应用到文本当中时,贝叶斯中的条件独立性假设导致词袋模型。另外一些文本分类方法如LDA和LSA也使用了这个模型。

 
   Example: Spam filtering
   In Bayesian spam filtering, an e-mail message is modeled as an unordered collection of words selected from

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值