机器学习(Machine Learning and Data Mining)CS 5751——final复习记录(1)

因为是整理来给自己看的,所以都是大纲……

(1)非监督学习Unsupervised Learning

类聚分析

A clustering is a set of clusters
(1)分区类聚Partitional Clustering
(2)分层类聚Hierarchical clustering

不同的类聚:
●分离良好的集群
(1)最小化簇内距离
(2)群集间距离最大化
●基于中心的群集
●连续集群
●基于密度的群集
●财产或概念
●由目标函数描述

具体算法:
1.K-means及其变体
2.分层聚类
3.基于密度的聚类

K-means

分区聚类方法
●必须指定簇数K.
●每个群集都与一个质心(中心点)相关联
●每个点都分配给具有最近质心的群集
●基本算法非常简单
在这里插入图片描述
●质心(通常)是群集中点的平均值。
●“接近度”由欧几里德距离,余弦相似度,相关性等来衡量。
●K-means将收敛于上述常见的相似性度量。
●大多数收敛发生在前几次迭代中。
() 停止条件通常会更改为“直到相对较少的点更改群集”
●复杂度为O(n * K * I * d)
()n =点数,K =簇数,I =迭代次数,d =属性数
●Sum of Squared Error (SSE)
在这里插入图片描述
(1)x是聚类Ci中的数据点,mi是聚类Ci的代表点
(2)给定两组集群,我们更喜欢具有最小SSE的集群
(3)减少SSE的一种简单方法是增加K,即簇的数量
注: 一个好的clustering在k值小的情况下拥有小的sse,而不是拥有极大的k

K-means 存在的问题

(1&

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Intrusion detection and analysis has received a lot of criticism and publicity over the last several years. The Gartner report took a shot saying Intrusion Detection Systems are dead, while others believe Intrusion Detection is just reaching its maturity. The problem that few want to admit is that the current public methods of intrusion detection, while they might be mature, based solely on the fact they have been around for a while, are not extremely sophisticated and do not work very well. While there is no such thing as 100% security, people always expect a technology to accomplish more than it currently does, and this is clearly the case with intrusion detection. It needs to be taken to the next level with more advanced analysis being done by the computer and less by the human. The current area of Intrusion Detection is begging for Machine Learning to be applied to it. Convergence of these two key areas is critical for it to be taken to the next level. The problem is that I have seen little research focusing on this, until now. After reading Machine Learning and Data Mining for Computer Security, I feel Dr Maloof has hit the target dead centre. While much research has been done across Computer Security independently and Machine Learning independently, for some reason no one wanted to cross-breed the two topics. Dr Maloof not only did a masterful job of focusing the book on a critical area that was in dire need of research, but he also strategically picked papers that complemented each other in a productive manner. Usually reading an edited volume like this, the chapters are very disjointed with no connection between them. While these chapters cover different areas of research, there is a hidden flow that complements the previous chapter with the next. While Dr Maloof points out in his Preface the intended audience, I feel that there are two additional critical groups. Firstly, I feel that any vendor or solution provider that is looking to provide a competitive a

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值