“Text categorization is the automated assignment of natural language texts to predefined categories based on their content. It is a supporting technology in several information processing tasks, including controlled vocabulary indexing, routing and packaging of news and other text streams, content filtering (spam, pornography, etc.), information security, help desk automation, and others. Closely related technology is applicable to other classification tasks on text, including classification with respect to personalized or emerging classes (alerting systems, topic detection and tracking), noncontent based classes (author identification, language identification), and to mixtures of text with other data (multimedia and cross-media indexing, text mining).”
文本分类是将自然语言文本根据其内容自动分配到预先定义的类别。它是多种信息处理任务中的一种支持技术,包括受控词汇索引、新闻和其他文本流的路由和打包、内容过滤(垃圾邮件、色情内容等)、信息安全、服务台自动化等。密切相关的技术适用于其他文本分类任务,包括对个性化或新兴类的分类(报警系统、主题检测与跟踪)、非内容类(作者识别、语言识别)和混合文本与其他数据(多媒体和跨媒体索引、文本挖掘)。
RCV1: A New Benchmark Collection for Text Categorization Research——1. Introduction 引言
最新推荐文章于 2022-05-11 21:40:48 发布