Unicorn论文复现

东南风五级

于 2024-07-13 18:53:54 发布

阅读量1.2k

点赞数 24

文章标签： python c++

本文链接：https://blog.csdn.net/weixin_62681253/article/details/140404507

版权

复现论文代码

论文名称：UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats
论文地址：https://arxiv.org/pdf/2001.01525.pdf

1.preparation

https://github.com/crimson-unicorn/core/wiki/Preparation

官方是gcc4.9（已经弃用，下新的没问题

git clone https://github.com/crimson-unicorn/core.git
apt-get install -y sudo python-is-python3 g++- libz-dev python3-pip python-dev build-essential 
sudo apt-get update
sudo apt-get install wget

E: Unable to correct problems, you have held broken packages.
-bash: pip3: command not found
Sulotion：
wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
E：Collecting pysqlite
Using cached pysqlite-2.8.3.tar.gz (80 kB)
Preparing metadata (setup.py) … error
error: subprocess-exited-with-error
Sulotion：
不用下，python3自带sqlite3

2.编译

make prepare
make download_streamspot
make run_toy

如果超时报错，直接3个组件分开git clone到build目录下
Attention:
1.如果遇到Killed，大概率内存不够，在parser_fast.py修改

def separate_graph(file_name, graph_id):
    """If file @file_name has many graphs, filter out
    other graphs except the one with ID @graph_id, and
    write to a temporary file."""
    print('\x1b[6;30;42m[INFO]\x1b[0m Separating graph {} from the file {}'.format(graph_id, file_name))
    # df = pd.read_csv(file_name, sep='\t', dtype=str, header=None)
    # filtered_df = df[df[5] == graph_id]
    # filtered_df.to_csv("tmp.csv", sep='\t', header=False, index=False)
    chunksize = 100000  # 每次读取的行数
    chunks = pd.read_csv(file_name, sep='\t', dtype=str, header=None, chunksize=chunksize)
    filtered_chunks = [chunk[chunk[5] == graph_id] for chunk in chunks]
    filtered_df = pd.concat(filtered_chunks)
    filtered_df.to_csv("tmp.csv", sep='\t', header=False, index=False)
    print("\x1b[6;30;42m[INFO]\x1b[0m Graph {} is separated.".format(graph_id))

2.报错：RuntimeError: failed to find interpreter for Builtin discover of python_spec=‘venv’
配置python3虚拟环境

pip install virtualenv
virtualenv -p python3 venv
source venv/bin/activate

修改parsers和modeler的makefile中的python到python3

test -f venv/bin/activate || virtualenv -p $(shell which python3) venv
. venv/bin/activate ; \

3.如果analyzer中途卡顿，修改makefile的number值重新跑make analyzer
4.modeler可能会报错 “/mnt/c/Users/90554/Documents/Code/C++/core/build/modeler/venv/lib/python3.10/site-packages/scipy/spatial/distance.py”, line 2161, in pdist
raise ValueError(‘A 2-dimensional array must be passed.’)
ValueError: A 2-dimensional array must be passed
修改model.py两个函数，profile.py的hamming

def load_sketches(fh):
    """Load sketches in a file from the handle @fh to memory as numpy arrays. """
    # sketches = list()
    # for line in fh:
    #     sketch = map(int, line.strip().split())
    #     sketches.append(sketch)
    # return np.array(sketches)
    """Load sketches in a file from the handle @fh to memory as numpy arrays. """
    sketches = list()
    max_length = 0
    for line in fh:
        sketch = list(map(int, line.strip().split()))  # 将 map 对象转换为列表
        sketches.append(sketch)
        max_length = max(max_length, len(sketch))  # 记录最大长度
    
    # 检查每个子列表的长度是否一致
    # max_length = max(len(sketch) for sketch in sketches)
    for sketch in sketches:
        if len(sketch) < max_length:
            sketch.extend([0] * (max_length - len(sketch)))  # 用0填充较短的子列表

    return np.array(sketches)


def pairwise_distance(arr, method='hamming'):
    """Wrapper function that calculates the pairwise distance between every
    two elements within the @arr. The metric (@method) is default as hamming.
    squareform function makes it a matrix for easy indexing and accessing. """
    # if isinstance(arr, map):
    #     arr = list(arr)
    # arr = np.asarray(arr, dtype=np.float64)
    # if arr.ndim == 1:
    #     arr = arr[:, np.newaxis]
    # print("Shape of input array:", arr.shape)
    # print("Dtype of input array:", arr.dtype)
    # return squareform(pdist(arr, metric=method))
    arr = np.asarray(arr, dtype=np.float64)
    if arr.ndim == 1:
        arr = arr[:, np.newaxis]
    return squareform(pdist(arr, metric=method))

def calculate_hamming_distance(array1, array2):
    # 确保数组长度相同
    min_length = min(len(array1), len(array2))
    array1 = array1[:min_length]
    array2 = array2[:min_length]
    
    # 计算汉明距离
    # distance = hamming(array1, array2)
    distance=np.sum(array1 != array2)
    return distance