复现论文代码
论文名称:UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats
论文地址:https://arxiv.org/pdf/2001.01525.pdf
1.preparation
https://github.com/crimson-unicorn/core/wiki/Preparation
官方是gcc4.9(已经弃用,下新的没问题
git clone https://github.com/crimson-unicorn/core.git
apt-get install -y sudo python-is-python3 g++- libz-dev python3-pip python-dev build-essential
sudo apt-get update
sudo apt-get install wget
E: Unable to correct problems, you have held broken packages.
-bash: pip3: command not found
Sulotion:
wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
E:Collecting pysqlite
Using cached pysqlite-2.8.3.tar.gz (80 kB)
Preparing metadata (setup.py) … error
error: subprocess-exited-with-error
Sulotion:
不用下,python3自带sqlite3
2.编译
make prepare
make download_streamspot
make run_toy
如果超时报错,直接3个组件分开git clone到build目录下
Attention:
1.如果遇到Killed,大概率内存不够,在parser_fast.py修改
def separate_graph(file_name, graph_id):
"""If file @file_name has many graphs, filter out
other graphs except the one with ID @graph_id, and
write to a temporary file."""
print('\x1b[6;30;42m[INFO]\x1b[0m Separating graph {} from the file {}'.format(graph_id, file_name))
# df = pd.read_csv(file_name, sep='\t', dtype=str, header=None)
# filtered_df = df[df[5] == graph_id]
# filtered_df.to_csv("tmp.csv", sep='\t', header=False, index=False)
chunksize = 100000 # 每次读取的行数
chunks = pd.read_csv(file_name, sep='\t', dtype=str, header=None, chunksize=chunksize)
filtered_chunks = [chunk[chunk[5] == graph_id] for chunk in chunks]
filtered_df = pd.concat(filtered_chunks)
filtered_df.to_csv("tmp.csv", sep='\t', header=False, index=False)
print("\x1b[6;30;42m[INFO]\x1b[0m Graph {} is separated.".format(graph_id))
2.报错:RuntimeError: failed to find interpreter for Builtin discover of python_spec=‘venv’
配置python3虚拟环境
pip install virtualenv
virtualenv -p python3 venv
source venv/bin/activate
修改parsers和modeler的makefile中的python到python3
test -f venv/bin/activate || virtualenv -p $(shell which python3) venv
. venv/bin/activate ; \
3.如果analyzer中途卡顿,修改makefile的number值重新跑make analyzer
4.modeler可能会报错 “/mnt/c/Users/90554/Documents/Code/C++/core/build/modeler/venv/lib/python3.10/site-packages/scipy/spatial/distance.py”, line 2161, in pdist
raise ValueError(‘A 2-dimensional array must be passed.’)
ValueError: A 2-dimensional array must be passed
修改model.py两个函数,profile.py的hamming
def load_sketches(fh):
"""Load sketches in a file from the handle @fh to memory as numpy arrays. """
# sketches = list()
# for line in fh:
# sketch = map(int, line.strip().split())
# sketches.append(sketch)
# return np.array(sketches)
"""Load sketches in a file from the handle @fh to memory as numpy arrays. """
sketches = list()
max_length = 0
for line in fh:
sketch = list(map(int, line.strip().split())) # 将 map 对象转换为列表
sketches.append(sketch)
max_length = max(max_length, len(sketch)) # 记录最大长度
# 检查每个子列表的长度是否一致
# max_length = max(len(sketch) for sketch in sketches)
for sketch in sketches:
if len(sketch) < max_length:
sketch.extend([0] * (max_length - len(sketch))) # 用0填充较短的子列表
return np.array(sketches)
def pairwise_distance(arr, method='hamming'):
"""Wrapper function that calculates the pairwise distance between every
two elements within the @arr. The metric (@method) is default as hamming.
squareform function makes it a matrix for easy indexing and accessing. """
# if isinstance(arr, map):
# arr = list(arr)
# arr = np.asarray(arr, dtype=np.float64)
# if arr.ndim == 1:
# arr = arr[:, np.newaxis]
# print("Shape of input array:", arr.shape)
# print("Dtype of input array:", arr.dtype)
# return squareform(pdist(arr, metric=method))
arr = np.asarray(arr, dtype=np.float64)
if arr.ndim == 1:
arr = arr[:, np.newaxis]
return squareform(pdist(arr, metric=method))
def calculate_hamming_distance(array1, array2):
# 确保数组长度相同
min_length = min(len(array1), len(array2))
array1 = array1[:min_length]
array2 = array2[:min_length]
# 计算汉明距离
# distance = hamming(array1, array2)
distance=np.sum(array1 != array2)
return distance
5.建议:分步make
3.过程截图和分析结果
parsers根据groupID生成train和test的基图和流图
analyzer生成sketch
modeler得到结果,cross-validation=2