《Data Mining:Concepts and Techniques》翻译与笔记

12异常值

定义

异常值是显著偏离数据集的那些数据对象,其可能由不同的机理产生。

分类

全局异常值:显著偏离数据集中剩余数据的对象,是最常见的异常值类型。例子:在计算机入侵检测中,如果计算机的通信行为与正常模式不同(如短时间内广播大量的数据包),就有可能受到了黑客入侵。

上下文异常值:在具体的背景下,显著偏离数据集。例子:某个温度值,在不同的地方,不同的季节,会考虑成为上下文异常值;在信用卡欺诈检测中,对于某个使用超过90%信用额度的用户,如果该用户是低信用额度的用户,这是一个正常现象。但如果是高信用额度的用户,就考虑为上下文异常值,这样的异常值意味着新的商机(提高额度带来更高的收益)。

集合异常点:单个数据点不会是异常值,但多个数据点聚合就会偏离整体数据集。例子:在股票交易中,若在短时间内两家公司有大量相同的股票交易,就可以考虑有人在操作交易市场。

检测方法

基于模型的统计方法:效果依赖于数据是否由统计模型产生
基于邻近度的方法:效果依赖于测度的定义
基于聚类的方法:聚类操作耗时,不适用于大规模数据

高维数据的检测:随着维度的增加,噪声的恶化会变严重
1.拓展传统的异常检测
HilOut算法、pca降维(取方差小的特征空间作为检测空间)
2.在子空间搜索异常值(异常值容易解释)
启发式在子空间搜索、稀疏系数
3.对高维数据建模
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
交易处理(Transaction Processing)是指在计算机系统中管理和执行一系列事务的过程。事务是指一组相关的操作或任务,可以是从简单的读、写操作到复杂的业务流程。在现代的信息系统中,交易处理是非常重要的,涉及到数据的准确性、可靠性、一致性和并发控制等问题。 交易处理的核心概念包括ACID属性(原子性、一致性、隔离性和持久性)、并发控制、故障恢复和日志记录等。原子性保证了事务中的操作要么全部成功,要么全部失败,不存在中间状态。一致性保证了事务执行前后数据的一致性。隔离性确保了并发执行的事务彼此互不干扰。持久性保证了事务提交后,其结果会被永久保存。这些属性是确保交易处理正确执行的基础。 并发控制是交易处理中的一个重要技术,它解决了多个事务同时访问和修改共享数据时可能出现的问题。并发控制通过使用各种技术(如锁、乐观并发控制和多版本并发控制等)来保证事务的正确执行顺序和数据的正确性。 故障恢复是指在系统出现故障或异常情况时,能够将数据恢复到正常状态的过程。故障恢复通常使用日志记录技术,将事务的操作日志记录下来,以便在系统崩溃或异常时进行恢复操作。 日志记录是交易处理中的一项关键技术。它记录了事务的操作序列,包括读操作、写操作和事务的提交和撤销等信息。通过日志记录,可以保证交易的原子性、一致性和恢复性。 总而言之,交易处理是一个重要的概念和技术,在现代的信息系统中起着至关重要的作用。它涉及到事务的管理、数据的一致性和并发控制等方面,通过使用ACID属性、并发控制、故障恢复和日志记录等技术来确保交易的正确执行。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值