Bingo

Scalable Function Matching

In BINGO, similarity matching is done at the granularity of function and/or sub-function levels.

To this end, we propose function models consisting of length variant partial traces that are more flexible, in terms of comparison granularity, compared to basic-block centric function models .

Then, from these partial traces we extract symbolic expressions and based on the I/O samples generated from these symbolic expressions , we match the partial traces inside two functions.

To reduce the noise, we remove the partial traces that are infeasible to reach (via solving the symbolic expressions) or are specific to compilers.

Lastly, we apply Jaccard containment similarity to measure the similarity score of two function models.

Length Variant Partial Trace Extraction

在这里插入图片描述

Semantic Feature Extraction

If a machine state, before executing a partial trace, is given by
X = <mem, reg, flag>,
then the machine state immediately after execution is given by
X’= <mem’, reg’, flag’>.
Here, X and X’ are referred to as pre and post-state, respectively, and the symbolic expressions capture the relationship between these pre and post machine states.

Using a constraint solver to measure semantic equivalence is very
expensive and not scalable for real-world cases.

To tackle the scalability issue, a machine learning technique is applied. Specifically, Input/Output (or I/O) samples are randomly generated from the symbolic expressions and are fed into the machine learning module to find semantically similar functions. Here, I/O samples are
generated by assigning concrete values to the pre-state variables
and obtaining the corresponding output values (concrete) for the
post-state variables in the symbolic expressions.

One of the drawbacks in randomly generating I/O samples is that
the dependency among symbolic expressions are ignored .在这里插入图片描述
These two symbolic expressions are mutually dependent through
a common pre-state variable edx. However, in randomly generated
I/O samples, edx can be assigned to two different values in each symbolic expression, which ignores the dependency and may lead to inaccurate semantics.

To satisfy the symbolic expressions in Eq. 1 and Eq. 2, the pre-state state variable edx needs to be set to 1, which makes the post-state variables zf‘ to be true and ecx’ to be 5.

To this end, we use Z3 constraint solver to generate the I/O samples. Generating I/O samples via Z3 is much more scalable than using it to prove the equivalence of two symbolic expressions — I/O sample generation is a one-time job for each partial trace, while equivalence checking uses the constraint solver every single time a partial trace is compared with another one.

In many cases, there are more than one concrete value for any pre- or post-state variable that satisfy all the symbolic expressions. Thus, we set an upper limit N for possible values. According to the study in, N is set to 3 in our design.

Trace Pruning

Infeasible Partial Trace Pruning

In BINGO, we prune the obvious infeasible partial traces with the help of the constraint solver. That is, given the symbolic expressions, extracted from a partial trace, we rely on the constraint solver to determine whether all the constraints can be satisfied. We need no additional effort to identify the infeasible partial traces — if the constraint solver is unable to find appropriate concrete values for the pre- and post-state variables, the relevant partial trace is considered infeasible.

Compiler Specific Code Pruning

Based on the compilation option selected, the compiler might include additional code (i.e., code that is not originally written by the programmer) into the compiled binary.

在这里插入图片描述The code segments in the function prologue and epilogue in
Fig. 6 ensure the stack integrity is not violated.

In similarity matching between the signature and target functions, the additional compiler specific code can introduce noise by changing the code structures and diluting the similar parts, especially when the functions are small. Thus, it is necessary to identify and remove the compiler specific code from the partial traces. However, directly removing some code from a partial trace might lead to incorrect semantic features, as these features are very sensitive to the underlying code semantics. To this end, we propose a conservative approach to address this problem by generalizing the compiler specific code into some patterns and systematically pruning the partial traces that contain these patterns. That is, instead of removing the compiler specific code from a partial trace, which is error prone, we just remove the partial trace itself if the compiler specific code pattern accounts for the majority (50% or more) of the code.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值