Unicorn论文复现

复现论文代码

论文名称:UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats
论文地址:https://arxiv.org/pdf/2001.01525.pdf

1.preparation

https://github.com/crimson-unicorn/core/wiki/Preparation

官方是gcc4.9(已经弃用,下新的没问题
git clone https://github.com/crimson-unicorn/core.git
apt-get install -y sudo python-is-python3 g++- libz-dev python3-pip python-dev build-essential 
sudo apt-get update
sudo apt-get install wget

E: Unable to correct problems, you have held broken packages.
-bash: pip3: command not found
Sulotion:
wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
E:Collecting pysqlite
Using cached pysqlite-2.8.3.tar.gz (80 kB)
Preparing metadata (setup.py) … error
error: subprocess-exited-with-error
Sulotion:
不用下,python3自带sqlite3

2.编译

make prepare
make download_streamspot
make run_toy

如果超时报错,直接3个组件分开git clone到build目录下
Attention:
1.如果遇到Killed,大概率内存不够,在parser_fast.py修改

def separate_graph(file_name, graph_id):
    """If file @file_name has many graphs, filter out
    other graphs except the one with ID @graph_id, and
    write to a temporary file."""
    print('\x1b[6;30;42m[INFO]\x1b[0m Separating graph {} from the file {}'.format(graph_id, file_name))
    # df = pd.read_csv(file_name, sep='\t', dtype=str, header=None)
    # filtered_df = df[df[5] == graph_id]
    # filtered_df.to_csv("tmp.csv", sep='\t', header=False, index=False)
    chunksize = 100000  # 每次读取的行数
    chunks = pd.read_csv(file_name, sep='\t', dtype=str, header=None, chunksize=chunksize)
    filtered_chunks = [chunk[chunk[5] == graph_id] for chunk in chunks]
    filtered_df = pd.concat(filtered_chunks)
    filtered_df.to_csv("tmp.csv", sep='\t', header=False, index=False)
    print("\x1b[6;30;42m[INFO]\x1b[0m Graph {} is separated.".format(graph_id))

2.报错:RuntimeError: failed to find interpreter for Builtin discover of python_spec=‘venv’
配置python3虚拟环境

pip install virtualenv
virtualenv -p python3 venv
source venv/bin/activate

修改parsers和modeler的makefile中的python到python3

test -f venv/bin/activate || virtualenv -p $(shell which python3) venv
. venv/bin/activate ; \

3.如果analyzer中途卡顿,修改makefile的number值重新跑make analyzer
4.modeler可能会报错 “/mnt/c/Users/90554/Documents/Code/C++/core/build/modeler/venv/lib/python3.10/site-packages/scipy/spatial/distance.py”, line 2161, in pdist
raise ValueError(‘A 2-dimensional array must be passed.’)
ValueError: A 2-dimensional array must be passed
修改model.py两个函数,profile.py的hamming

def load_sketches(fh):
    """Load sketches in a file from the handle @fh to memory as numpy arrays. """
    # sketches = list()
    # for line in fh:
    #     sketch = map(int, line.strip().split())
    #     sketches.append(sketch)
    # return np.array(sketches)
    """Load sketches in a file from the handle @fh to memory as numpy arrays. """
    sketches = list()
    max_length = 0
    for line in fh:
        sketch = list(map(int, line.strip().split()))  # 将 map 对象转换为列表
        sketches.append(sketch)
        max_length = max(max_length, len(sketch))  # 记录最大长度
    
    # 检查每个子列表的长度是否一致
    # max_length = max(len(sketch) for sketch in sketches)
    for sketch in sketches:
        if len(sketch) < max_length:
            sketch.extend([0] * (max_length - len(sketch)))  # 用0填充较短的子列表

    return np.array(sketches)


def pairwise_distance(arr, method='hamming'):
    """Wrapper function that calculates the pairwise distance between every
    two elements within the @arr. The metric (@method) is default as hamming.
    squareform function makes it a matrix for easy indexing and accessing. """
    # if isinstance(arr, map):
    #     arr = list(arr)
    # arr = np.asarray(arr, dtype=np.float64)
    # if arr.ndim == 1:
    #     arr = arr[:, np.newaxis]
    # print("Shape of input array:", arr.shape)
    # print("Dtype of input array:", arr.dtype)
    # return squareform(pdist(arr, metric=method))
    arr = np.asarray(arr, dtype=np.float64)
    if arr.ndim == 1:
        arr = arr[:, np.newaxis]
    return squareform(pdist(arr, metric=method))

def calculate_hamming_distance(array1, array2):
    # 确保数组长度相同
    min_length = min(len(array1), len(array2))
    array1 = array1[:min_length]
    array2 = array2[:min_length]
    
    # 计算汉明距离
    # distance = hamming(array1, array2)
    distance=np.sum(array1 != array2)
    return distance

5.建议:分步make

3.过程截图和分析结果

parsers根据groupID生成train和test的基图和流图
请添加图片描述

analyzer生成sketch
请添加图片描述

modeler得到结果,cross-validation=2
请添加图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值