混合模式下计算机应用前景,基于混合模式的网页过滤系统研究-计算机应用技术专业论文.docx...

摘要作为互联网的重要应用,网页浏览一直受到广大网民的青睐。但

摘要

作为互联网的重要应用,网页浏览一直受到广大网民的青睐。但 是互联网在给大家带来便利的同时,无用信息和不良网页问题也日益 严峻。这类网页不仅耗费网络带宽和计算机时空开销,而且不良信息 会对用户的身心健康造成严重的干扰。

网页过滤系统旨在帮助用户屏蔽无用和不良的网页信息。目前经 常采用的网页过滤技术一般包括地址过滤、规则过滤以及敏感词过滤 等。传统的过滤方法简单快速,但是对健康网页的误判率还较高。另 外一个思路就是从网页的文本内容入手,使用文本分类、信息过滤的 算法,在训练网页集合上学习网页分类器来进行网页过滤。由于网页 过滤系统通常是在线式的应用环境,在将文本分类算法引入到网页过 滤中时,往往很难在过滤的准确度和处理的实时性上达到平衡。

本文构建了基于混合模式的网页过滤系统,将传统的基于网址过 滤和敏感词过滤的方法与基于文本分类的过滤方法结合起来。文章重 点讨论了特征量选取、网页结构化信息利用、文本分类算法组合等方 面的改进措施。实验表明这种模型在保持易于实现的特点的同时,在 速度和准确度方面都有不同程度的提高。

关键词信息过滤,文本分类,特征量选择,朴素贝叶斯,人工神经 网络

ABSTRACTAs

ABSTRACT

As an important application of the Internet,W曲browsing has been favored by the majority of Internet users.The Internet brings convenience to everyone,while unwanted information and harmful website are also

increasingly serious problems.Those websites are not only consuming network bandwidth and computer time expenses,harmful website even would cause seriOUS physical and mental health problems.

Wreb filtering system is to help users keeping away from unwanted

information and harmful website.Black or white list technology

rules and keyword based content filtering technology are often used in website filtering.Those traditional methods of website filtration are very simple and fast,but the accuracy of those methods still needs to be

erdaanced.

Another approach is using automated text categorization and information filtering to filter unwanted information and harmful website. Since web filtering system usually works in online mode,when useing text classification algorithm,it is difficult to balance the speed and

accuracy.

This paper constructs a hybrid model based web filtering system which make the traditional filtering technique and new algorithms of text classification work tegather.The main point of this paper is to study how to use the structured information of website。how to improve the techniques of feature extraction and how to combinate text classification algorithms.Experiments show that the new model gain better speed

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值