SpamBayes

最新推荐文章于 2024-03-31 09:38:56 发布

Winnycatty

最新推荐文章于 2024-03-31 09:38:56 发布

阅读量429

点赞数

分类专栏：垃圾邮件过滤机器学习文章标签： SpamBayes 垃圾邮件过滤贝叶斯邮件过滤

机器学习同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

垃圾邮件过滤

2 篇文章 0 订阅

订阅专栏

                    
                        
                    
                    SpamBayes是一个用Python编写的贝叶斯 垃圾邮件过滤器，它使用了Paul Graham在他的文章“垃圾邮件计划”中提出的技巧。随后，Gary Robinson和Tim Peters等人对其进行了改进。
传统的贝叶斯过滤器和SpamBayes使用的过滤器之间最显着的区别是有三种分类而不是两种：垃圾邮件，非垃圾邮件（在SpamBayes中称为ham），和不确定。
用户将消息训练为火腿或垃圾邮件; 过滤邮件时，垃圾邮件过滤器为火腿生成一个分数，为垃圾邮件生成另一个分数。如果垃圾邮件分数较高且火腿分数较低，则该邮件将被归类为垃圾邮件。
如果垃圾邮件分数较低且火腿得分较高，则该邮件将被归类为火腿。如果分数既高又低，则该消息将被归类为不确定。
不确定的这种方法导致假阳性和假阴性的数量较少，但它可能导致许多需要人类决定的不确定因素。
来自维基百科，原文如下：
 SpamBayes Original author(s)is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay “A Plan for Spam”. It has subsequently been improved by Gary Robinson and Tim Peters, among others.
 The most notable difference between a conventional Bayesian filter and the filter used by SpamBayes is that there are three classifications rather than two: spam, non-spam (called ham in SpamBayes), and unsure. The user trains a message as being either ham or spam; when filtering a message, the spam filters generate one score for ham and another for spam.
 If the spam score is high and the ham score is low, the message will be classified as spam.
 If the spam score is low and the ham score is high, the message will be classified as ham.
 If the scores are both high or both low, the message will be classified as unsure.
 This approach leads to a low number of false positives and false negatives, but it may result in a number of unsures which need a human decision.

                

Winnycatty

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SpamBayes

SpamBayes是一个用Python编写的贝叶斯垃圾邮件过滤器，它使用了Paul Graham在他的文章“垃圾邮件计划”中提出的技巧。随后，Gary Robinson和Tim Peters等人对其进行了改进。传统的贝叶斯过滤器和SpamBayes使用的过滤器之间最显着的区别是有三种分类而不是两种：垃圾邮件，非垃圾邮件（在SpamBayes中称为ham），和不确定。用户将消息训练为火腿或垃...
复制链接

扫一扫

专栏目录