【ICSE 2021】ATVHUNTER: Reliable Version Detection of Third-Party Libraries for Vulnerability 论文笔记

ATVHUNTER是一个针对Android应用的第三方库版本检测工具,旨在精确识别TPL版本并查找安全隐患。通过两阶段检测方法结合控制流图和基本块的opcode特征,ATVHUNTER能有效识别9,050个易受攻击的应用,涉及53,337个已知漏洞和7,480个安全漏洞。该工具有助于提高Android应用的安全性。
摘要由CSDN通过智能技术生成

【ICSE 2021】ATVHUNTER: Reliable Version Detection of Third-Party Libraries for Vulnerability Identification in Android Applications

在这里插入图片描述

单位:1The Hong Kong Polytechnic University(香港理工大学), 2Nankai Univerisity(南开大学), 3Tianjin University(天津大学), 4Nanyang Technological University(南洋理工大学), 5Monash University(蒙纳士大学)

会议:ICSE 2021

论文链接:ATVHUNTER: Reliable Version Detection of Third-Party Libraries for Vulnerability Identification in Android Applications
论文源码:本论文未开源,但是提供了在线的检测工具:https://scantist.io
参考:https://github.com/Anonymous-Phunter/PHunter

ABSTRACT

该文章提出了ATVHUNTER (Android in-app Third-party library
Vulnerability Hunter),通过对安卓app中Third-party libraries (TPLs)的精确版本的检测和对TPL vulnerabilities信息的收集,提供输入app的TPLs和相关vulnerabilities的信息。其本质是一种有先验知识的similarity-based library detection方案。

在app分析方面,ATVHUNTER采用了two-phase detection approach来identify specific TPL versions: Control Flow Graphs(CFG) as the coarse-grained feature和opcode in each Basic Block of CFG as the fine-grained feature。

在reference database创建方面,ATVHUNTER创建的TPL database 包含189,545 unique TPLs with 3,006,676 versions;ATVHUNTER创建的TPL vulnerability database 包含了TPL中出现的1,180 CVEs and 224 security bugs。

作者对ATVHUNTER进行了Effectiveness、Efficiency和Obfuscation-resilient Capability方面的Evaluation;使用ATVHUNTER对104,446个top apps进行了Large-Scale Analysis,发现其中9,050个vulnerable apps,涉及到10,616 vulnerable TPLs中的53,337 known vulnerabilities 和 7,480 security bugs。

1. INTRODUCTION

1.1 TPL detection的意义

  • Attackers can exploit the vulnerabilities in TPLs
  • Attackers can inject backdoors in TPLs
  • TPLs are scattered in different apps
  • The information of TPL components in apps may be not transparent to app developers(due to many direct or transitive dependencies)

1.2 现有的TPL detection方案

  • 无先验知识:

    • clustering-based methods:

      LibRadar(ICSE 2016)、LibD(ICSE 2017)、LibExtractor(WiSec 2020)

  • 有先验知识:

    • similarity-based methods:

      LibScout(CCS 2016)、LibID(ISSTA 2019)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0jJcZdcY-1646827899571)(images\image-20220303014039869.png)]

数据来源:Research on Third-Party Libraries in Android Apps: A Taxonomy and Systematic Literature Review (TSE 2021)

1.3 现有TPL detection方案的weaknesses

  • Clustering-based methods:

    • require a considerable number of apps as input
    • Low recall:only can identity commonly-used TPLs
    • Labor-intensive:verifying the clustering results is labor-intensive
    • Imprecise:inability of precise version identification
  • Similarity-based methods:

    • require a predefined TPL database as the reference database
    • Low recall:current published size of TPL database is far smaller than that in the actual market
    • Imprecise:inability of precise version identification

2. ARCHITECTURE

在这里插入图片描述

2.1 TPL Detection

目的:根据TPL database中的数据,识别出app中包含哪些TPL

2.1.1 Preprocessing
  • Task 1:将apk反编译成bytecode并转换成IR(借助APKTOOL

  • Task 2:删除apk中的 primary module

    • primary module:app开发者实现的代码

    • non-primary module:TPLs

    • 实现方案:

      • 根据AndroidManifest.xml找到包含MainActivity的package

        • 例如:

          < manifest …… package="com.cmic.sso.myapplication" …… >
          
      • 删除package的namespace下面的文件

    • Side Effects:

      • Side Effect 1:package flattening & package renaming obfuscation 导致host code无法被删除

        • 混淆前:

          mycompany.myapplication.MyMainActivity
          mycompany.myapplication.Foo
          mycompany.myapplication.Bar
          mycompany.myapplication.extra.FirstExtra
          mycompany.myapplication.extra.SecondExtra
          mycompany.util.FirstUtil
          mycompany.util.SecondUtil
          
        • Proguard 默认混淆后:

          mycompany.myapplication.MyMainActivity
          mycompany.myapplication.a
          mycompany.myapplication.b
          mycompany.myapplication.a.a
          mycompany.myapplication.a.b
          mycompany.a.a
          mycompany.a.b
          
        • -flattenpackagehierarchy 'myobfuscated'混淆后:

          mycompany.myapplication.MyMainActivity
          mycompany.myapplication.a
          mycompany.myapplication.b
          myobfuscated.a.a
          myobfuscated.a.b
          myobfuscated.b.a
          myobfuscated.b.b
          

          myobfuscated.a替代mycompany.myapplication.extra

          导致mycompany.myapplication.extra.FirstExtra和mycompany.myapplication.extra.SecondExtra无法被删除

      • Side Effect 2:special package name 导致host code无法被删除

      • Side Effect 3:host app and TPLs have the same package namespace 导致TPLs被误删

    • Side Effect 1和2:不影响the accuracy of TPL identification

    • Side Effect 3:导致FN

2.1.2 Module Decoupling

目的:将TPLs拆分开

拆分方法:每个Class Dependency Graph (CDG)作为一个TPL candidate(借助Androguard

class dependency relationship includes:

① class inheritance
② method call relationship
③ field reference relationship

2.1.3 Feature Generation

目的:提取每个TPL的fingerprint

方法:

(1) coarse-grained feature 粗粒度特征:

① 对candidate TPLs中的每个method提取CFG(借助soot),并为CFG中的每个节点(BB)编号(按照执行顺序先后,从小到大编号)

​ 编号时,对于分支节点n的子节点:

  • outgoing edges更多的node编号为n+1
  • outgoing edges相同,statements更多的node编号为n+1

② 以nodeCount -> (child1,child2,…)的形式表示一个node

③ 以adjacency list的形式表示一个CFG(对应一个method)
adjacency list形如[parent1 -> (child1,child2,…), parent2-> …]

④ 对adjacency list计算hash值(每个adjacency list对应一个method

⑤ 将TPL的所有method对应的hash值进行排序,并对排序后的序列计算hash值,将该hash值作为TPL的coarse-grained feature(T1)

(2) fine-grained feature 细粒度特征:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IahV4z5u-1646827899572)(images\image-20220303033245114.png)]

① 对每个CFG,按照adjacency list,提取其中的BB的opcode(借助soot

② 对opcode sequence计算 Fuzzy Hash 值(借助ssdeep

fuzzy hash的优势是:If one part of the feature changes due to code obfuscation, it would not cause a big difference to the final fingerprint.

2.1.4 TPL Database Construction
  • We crawled all Java TPLs from Maven Repository (189,545 unique TPLs with their 3,006,676 versions) to build our TPL database.

  • We store both coarse-grained and fine-grained features in a MongoDB database.

  • We spent more than one month to collect all the TPLs and another two months to generate the TPL feature database.

2.1.5 Library Identification

目的:尝试去找到app中的TPL candidate 对应的TPL和TPL version

(1) Potential TPL Identification
  • a) Search by package names

    通过package name过滤掉一些不相关的TPL

    • 当TPL candidate的package name未被混淆时:过滤掉不相关TPL
    • 当TPL candidate的package name被混淆时:不进行任何过滤
  • b) Search by the number of classes

    本质是通过the number of classes过滤掉一些不相关的TPL

    两者中一方的class数量 < 另一方的class数量的40%时,不再进行后续比较

  • c) Search by coarse-grained features

    • coarse-grained feature(T1)完全相同,则认为匹配上

    • coarse-grained feature(T1)超过70%相同,则认为找到了potential TPL

      只对potential TPL进行后续的Version Identification

(2) Version Identification
  • 两个method之间的相似度

    Method Similarity Score (MSS)

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-P4gfM3Us-1646827899572)(images\image-20220303041014644.png)]

    • 其中 d [ m a , m b ] d[m_a,m_b] d[ma,mb] 代表 m a m_a ma m b m_b mb 的fingerprint(adjacency list的hash值)之间的Edit Distance(借助ssdeep

    • Edit Distance:the number of minimum edit operations (i.e., insertion, deletion, and substitution) that is required to modify one fingerprint to the other.

    • 如果MSS的值 ≥ θ ( = 0.85 ) \ge\theta (= 0.85) θ(=0.85),则认为两个method是matched

  • 两个TPL之间的相似度

    TPL Similarity Score (TSS)

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3IPhAYo0-1646827899573)(images\image-20220303041501809.png)]

    • t 1 t_1 t1:代表一个来自app的TPL

    • t 2 t_2 t2:代表一个来自 TPL DB的TPL

    • M ∣ t 2 ∣ M|t_2| Mt2:t2中的method数量

    • M ∣ t 1 ∩ t 2 ∣ M|t_1 \cap t_2| Mt1t2:满足以下条件的方法 m j m_j mj的数量

      • m j m_j mj t 2 t_2 t2中的方法
      • 存在 m i ∈ t 1 m_i \in t_1 mit1 M S S ( m i , m j ) ≥ θ ( = 0.85 ) MSS(m_i,m_j) \ge \theta(=0.85) MSS(mi,mj)θ(=0.85)
      • t 1 t_1 t1 t 2 t_2 t2中至少存在一对MSS值为1的方法(完全matched的方法)
    • TSS值 ≥ δ = 0.95 \ge\delta=0.95 δ=0.95时,认为两个方法匹配上(有多个matched方法时,取TSS值最大的作为最终结果)

2.2 Vulnerable TPL-V Identification

2.2.1 Database Construction
(1) Known TPL Vulnerability Collection
  • 从TPL database中提取TPL的CPE名称
    • CPE 2.3:cpe:/<part>:<vendor>:<product>:<version>:<update>:<edition>:<language>
  • 使用cve-search工具搜索TPL相关的vulnerability
  • Finally, we collected 1,180 CVEs from 957 unique TPLs with 38,243 affected versions.
(2) Security Bug Collection
  • We also obtain 224 security bugs from Github and Bitbucket.
  • These bugs come from 152 open-source TPLs with their corresponding 4,533 versions.
2.2.2 Vulnerable TPL-V Identification

检查匹配上的TPL是否是vulnerable的

3. EVALUATION

衡量ATVHUNTER的有效性和性能

3.1 Preparation

3.1.1 Ground-truth Dataset Construction
  • We first collect the latest versions of 500 open-source apps from F-Droid.
  • For each app, we manually analyze it and get the in-app TPLs with their specific versions.
  • We then download these TPLs with their versions from the
    Maven repository.
  • We filter 144 apps out due to the incomplete versions of TPLs maintained in the Maven repository.
  • We choose 356 apps and 189 unique TPLs with the complete 6,819
    version files as the ground truth.
3.1.2 Threshold Selection
  • We randomly select three groups (3 * 200) of apps except the aforementioned dataset to decide appropriate thresholds for MSS and TSS. [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EMuj69Ni-1646827899573)(images\image-20220303044054716.png)]

3.2 Effectiveness Evaluation

在这里插入图片描述

3.3 Efficiency Evaluation

在这里插入图片描述

3.4 Obfuscation-resilient Capability

在这里插入图片描述

4. LARGE-SCALE ANALYSIS

使用ATVHUNTER来reveal real world中TPL vulnerability的impact

  • We collected commercial Android apps from Google Play based on the number of installations.
  • We finally collected 104,446 apps and found 72% of them (73,110/104,446) use TPLs.
  • 9,050/73,110 of apps include vulnerable TPLs, involving 53,337 vulnerabilities and 7,480 security bugs.
    • vulnerabilities are from 166 TPLs with 10,362 versions
    • security bugs are from 27 TPLs with 284 versions

5. DISCUSSION

Limitations:

(1)About native libraries:hash的方案可能不奏效

(2)About app packing

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值