基于机器学习的实时异常检测

目的

实时监测网络流量

The ultimate goal of this research work is to be able to use different algorithms to build different models and to compare their accuracy in detecting anomalous behavior on the Netflow data.

方法(模型)

LADS uses the One-Class SVM algorithm to construct a hyper-plane or set of hyper-planes in a high or infinite dimensional space. A good separation is achieved by the hyper-plane that has the largest dis-tance to the nearest training data points of any class (functional margin), therefore, the larger the margin the lower the generalization error of the classifier.

svm(支持向量机算法) 一种有监督的二分类算法

数据集

using a valid data-set containing over 1.4 million packets (captured using NetFlow v5 and v9)

The network has two capture points, one on Room 1 and another on Room 2, which were running Wire-shark3. The same program was used to create two packet capture files that were later converted into Net-Flow v54 and NetFlow v95 with a maximum time in-terval of 3 minutes for each flow and stored using nf-dump6 tools. Table 1 summarizes the information of the used dataset.

Before the training process, we cleaned the dataset so that broadcast, multicast and non-internal IP ad-dresses were discarded. We split the dataset into two:

Dataset 1: IP addresses from Room 1 excluding broadcast and multicast (6,000 flows); and Dataset 2: keeping only internal traffic inside Room 1 (5,800 flows).

select feature

  • ip address distance: For this experiment we transformed all IP ad-dresses into integer values (e.g., 172.18.21.4 is trans-formed to 2886735108) and we subtract the resulting integer of the modelled IP with the integer of the new observed IP. The LADS has been trained with dataset 1 and 2. We have used the One-Class SVM algorithm to compute the distance from the closest to the far-thest IP address. A model based on these data is cre-ated and a region is designed accordingly. During the testing part, we have added seven IP addresses from outside the range used during the training process. All IPs from the block are considered legitimate and all those that fall outside the boundaries are considered anomalous. Results are shown in Figure 2.
  • ip address and procotols distance
  • we split the IP address into four octets and each octet is treated as one different feature. (e.g., 172.18.21.4 is transformed to 288,67,35,108)
  • we transformed each IP into its corre-sponding binary using the Label Binarizer encoding method
  • IP Location: we evaluate if the IP address of the analyzed instance is source or destination, for which a value of zero or one will be allocated ac-cordingly (i.e., 0 if it is a source IP, 1 if it is a destination IP).IP Distance: we compute the distance between the modelled IP and the new one (as performed in pre-vious experiments).IP Knowledge: we evaluate if the IP address of the analyzed instance is known or unknown, for which a value of zero or one will be allocated ac-cordingly (i.e., 0 if it is a known IP, 1 if it is an unknown IP).

traing

Since the training process requires to build a model based on the distance among IPs within the dimensional space at which they are embedded, an IP transformation is required. In order to reduce the dimensionality of the dataset used by the LADS, maintaining at the same time as much as possible information carried by the samples, the Principal Component Analysis func-tion (PCA) is essential to find alternative features that maintain around 99% of the data variance, meaning that around 99% of information is carried by the orig-inal dataset.

testing

测试了四个特征

效果

Results show that a combination of multiple fea-tures (i.e., IP source, IP destination, distance between IPs, IP known, IP unknown) provides more accurate results and reduces considerably the false rates in the analysis performed.

对比

manual inspection 手工检查 for instance 例如 Live Anomaly Detection System (denoted by LADS)实时异常检测系统

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

西杭

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值