《Data Mining:Concepts and Techniques》翻译与笔记

机器学习 同时被 2 个专栏收录
11 篇文章 1 订阅
8 篇文章 0 订阅

12异常值

定义

异常值是显著偏离数据集的那些数据对象,其可能由不同的机理产生。

分类

全局异常值:显著偏离数据集中剩余数据的对象,是最常见的异常值类型。例子:在计算机入侵检测中,如果计算机的通信行为与正常模式不同(如短时间内广播大量的数据包),就有可能受到了黑客入侵。

上下文异常值:在具体的背景下,显著偏离数据集。例子:某个温度值,在不同的地方,不同的季节,会考虑成为上下文异常值;在信用卡欺诈检测中,对于某个使用超过90%信用额度的用户,如果该用户是低信用额度的用户,这是一个正常现象。但如果是高信用额度的用户,就考虑为上下文异常值,这样的异常值意味着新的商机(提高额度带来更高的收益)。

集合异常点:单个数据点不会是异常值,但多个数据点聚合就会偏离整体数据集。例子:在股票交易中,若在短时间内两家公司有大量相同的股票交易,就可以考虑有人在操作交易市场。

检测方法

基于模型的统计方法:效果依赖于数据是否由统计模型产生
基于邻近度的方法:效果依赖于测度的定义
基于聚类的方法:聚类操作耗时,不适用于大规模数据

高维数据的检测:随着维度的增加,噪声的恶化会变严重
1.拓展传统的异常检测
HilOut算法、pca降维(取方差小的特征空间作为检测空间)
2.在子空间搜索异常值(异常值容易解释)
启发式在子空间搜索、稀疏系数
3.对高维数据建模
  • 0
    点赞
  • 0
    评论
  • 1
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
©️2021 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值