Bingo

最新推荐文章于 2024-08-22 16:05:31 发布

桃子小迷妹

最新推荐文章于 2024-08-22 16:05:31 发布

阅读量531

点赞数

分类专栏：论文

本文链接：https://blog.csdn.net/weixin_43846270/article/details/105966642

版权

论文专栏收录该内容

20 篇文章 1 订阅

订阅专栏

Scalable Function Matching

In BINGO, similarity matching is done at the granularity of function and/or sub-function levels.

To this end, we propose function models consisting of length variant partial traces that are more flexible, in terms of comparison granularity, compared to basic-block centric function models .

Then, from these partial traces we extract symbolic expressions and based on the I/O samples generated from these symbolic expressions , we match the partial traces inside two functions.

To reduce the noise, we remove the partial traces that are infeasible to reach (via solving the symbolic expressions) or are specific to compilers.

Lastly, we apply Jaccard containment similarity to measure the similarity score of two function models.

Length Variant Partial Trace Extraction

在这里插入图片描述

Semantic Feature Extraction

If a machine state, before executing a partial trace, is given by
X = <mem, reg, flag>,
then the machine state immediately after execution is given by
X’= <mem’, reg’, flag’>.
Here, X and X’ are referred to as pre and post-state, respectively, and the symbolic expressions capture the relationship between these pre and post machine states.

Using a constraint solver to measure semantic equivalence is very
expensive and not scalable for real-world cases.

To tackle the scalability issue, a machine learning technique is applied. Specifically, Input/Output (or I/O) samples are randomly generated from the symbolic expressions and are fed into the machine learning module to find semantically similar functions. Here, I/O samples are
generated by assigning concrete values to the pre-state variables
and obtaining the corresponding output values (concrete) for the
post-state variables in the symbolic expressions.

One of the drawbacks in randomly generating I/O samples is that
the dependency among symbolic expressions are ignored . 在这里插入图片描述
These two symbolic expressions are mutually dependent through
a common pre-state variable edx. However, in randomly generated
I/O samples, edx can be assigned to two different values in each symbolic expression, which ignores the dependency and may lead to inaccurate semantics.

To satisfy the symbolic expressions in Eq. 1 and Eq. 2, the pre-state state variable edx needs to be set to 1, which makes the post-state variables zf‘ to be true and ecx’ to be 5.

To this end, we use Z3 constraint solver to generate the I/O samples. Generating I/O samples via Z3 is much more scalable than using it to prove the equivalence of two symbolic expressions — I/O sample generation is a one-time job for each partial trace, while equivalence checking uses the constraint solver every single time a partial trace is compared with another one.

In many cases, there are more than one concrete value for any pre- or post-state variable that satisfy all the symbolic expressions. Thus, we set an upper limit N for possible values. According to the study in, N is set to 3 in our design.

Trace Pruning

Infeasible Partial Trace Pruning

In BINGO, we prune the obvious infeasible partial traces with the help of the constraint solver. That is, given the symbolic expressions, extracted from a partial trace, we rely on the constraint solver to determine whether all the constraints can be satisfied. We need no additional effort to identify the infeasible partial traces — if the constraint solver is unable to find appropriate concrete values for the pre- and post-state variables, the relevant partial trace is considered infeasible.

Compiler Specific Code Pruning

Based on the compilation option selected, the compiler might include additional code (i.e., code that is not originally written by the programmer) into the compiled binary.

在这里插入图片描述 The code segments in the function prologue and epilogue in
Fig. 6 ensure the stack integrity is not violated.

In similarity matching between the signature and target functions, the additional compiler specific code can introduce noise by changing the code structures and diluting the similar parts, especially when the functions are small. Thus, it is necessary to identify and remove the compiler specific code from the partial traces. However, directly removing some code from a partial trace might lead to incorrect semantic features, as these features are very sensitive to the underlying code semantics. To this end, we propose a conservative approach to address this problem by generalizing the compiler specific code into some patterns and systematically pruning the partial traces that contain these patterns. That is, instead of removing the compiler specific code from a partial trace, which is error prone, we just remove the partial trace itself if the compiler specific code pattern accounts for the majority (50% or more) of the code.

桃子小迷妹

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Bingo

Scalable Function MatchingIn BINGO, similarity matching is done at the granularity of function and/or sub-function levels.To this end, we propose function models consisting of length variant part...
复制链接

扫一扫

专栏目录