关于shapelet的流量可解释

最新推荐文章于 2024-07-25 18:15:12 发布

zlq_zhongbeidaxue

最新推荐文章于 2024-07-25 18:15:12 发布

阅读量456

点赞数

本文链接：https://blog.csdn.net/qq_35964884/article/details/116172895

版权

前提描述：shapelet在处理原始数据之后，shapelets[idx, start, end] 其中idx表示的是选择的shapelet在test数据中的位置，但是因为test数据是随机选择的，所以需要有test数据和其源pcap名称一一对应

第一步保证test数据和源pcap名称一一对应代码：

import numpy as np
mal_idx = np.loadtxt('D:/desktop/malware-random-pathidx.txt')
nor_idx = np.loadtxt('D:/desktop/normal-random-pathidx.txt')
#pathidx时表示把输入数据的在源pcap文件夹中的位置保存下来
test_data_idx = np.append(nor_idx,mal_idx,axis=0)
f_n = open('D:/desktop/path-normal.txt', 'r')
f_m = open('D:/desktop/path-malware.txt', 'r')
path_n = list()
path_m = list()
path_m = f_m.readlines()
path_n = f_n.readlines()
f_n.close()
f_n.close()
path = list()
for i in range(1000):
    if(i<500):
        path.append(path_n[int(test_data_idx[i])])
    else:
        path.append(path_m[int(test_data_idx[i])])
f = open('D:/desktop/testData-path.txt', 'w')
for line in path:
    f.write(line)
f.close()

第二步将实验结束后的shapelets， indices，等保存为 .npy 文件的数据读出

import numpy as np
with open('D:/desktop/sha-ind-scores-xnew.npy', 'rb') as f:
    shapelets = np.load(f,allow_pickle=True)
    indices = np.load(f)
    scores  = np.load(f)
    xnew  = np.load(f)

第三步将shapelets对应到原始流量的具体位置，看其具体含义

以USTC数据集的shapelet的一个具体值为例

indice[0]中，113表示idx（从零开始），表示的是第114条数据（从1开始），查阅testData-path可知

经过剪切的pcap文件全名是：

MySQL.pcap.TCP_1-1-185-73_6587_1-2-160-203_3306.pcap

这时候需要做判断减少不必要的错误：

1.判断indice和shapelet是否对应

2.判断indice中的idx和pcap文件是否对应

如果一致就可以开始定位shapelet在pcap中的位置了

1.pcap文件也有两种，一种是filter一种是trimmed

2.filter中的可以进行wireshark解析，而trimmed中的不行，但是trimmed输入数据

3.因为之前对切割后的784数据有过处理，所以indice中的start在pcap以文本形式打开的10进制文件中就不再是那个位置了，具体来说是新的idx = 24+原来idx

4.同时这也不是shapelet在pcap的wireshark解析中的位置，因为pcap的文本表示往往比wireshark解析版多一些字节用作标记，这个不难理解，需要实际找出这些标记

##以上四点最好可以用程序实现自动化

随机选择数据，并且切割的代码如下：

import numpy as np
normal = np.loadtxt('D:/desktop/normal.txt')
malware = np.loadtxt('D:/desktop/malware.txt')
###find the random scale
print(malware.shape)
print(normal.shape)


normal_sample, _  = normal.shape
malware_sample, _ = malware.shape


###return a list of randoms from [a to b] 
from random import randint
def randomIntArray(a,b,size):
    ###return a list of randoms from [a to b] 
    randomidx = [ ]
    for i in np.arange(size):
        while(True):
            r = randint(a,b)
            if(not (r in randomidx)):
                randomidx.append(r)
                break
            else:
                continue
    return randomidx

        

normal_random = randomIntArray(0,normal_sample-1,500)
malware_random = randomIntArray(0,malware_sample-1,500)

###trans to be array
normal_random = np.array(normal_random)
np.savetxt('D:/desktop/normal-random-path.txt',normal_random,fmt='%d')
malware_random = np.array(malware_random)
np.savetxt('D:/desktop/malware-random-path.txt', malware_random, fmt='%d')

###seclected from normal and malware
normal_test = normal[normal_random,:].copy()
malware_test = malware[malware_random,:].copy()

# cut the first 24 and keep the length 500
test = np.append(normal_test, malware_test, axis = 0)
# normal_test.shape
# malware_test.shape
# test.shape
final_test = test[:,24:524].copy()
print(final_test.shape)

### write in the txt
np.savetxt('D:/desktop/testData.txt', final_test)

zlq_zhongbeidaxue

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
关于shapelet的流量可解释

前提描述：shapelet在处理原始数据之后，shapelets[idx, start, end] 其中idx表示的是选择的shapelet在test数据中的位置，但是因为test数据是随机选择的，所以需要有test数据和其源pcap名称一一对应第一步保证test数据和源pcap名称一一对应代码： import numpy as npmal_idx = np.loadtxt('D:/desktop/malware-random-pathidx.txt')nor_idx = np.load.
复制链接

扫一扫