恶意代码检测

最新推荐文章于 2024-09-07 09:46:41 发布

大青呐

最新推荐文章于 2024-09-07 09:46:41 发布

阅读量4.4k

点赞数

分类专栏：机器学习文章标签：恶意代码

机器学习专栏收录该内容

12 篇文章 2 订阅

订阅专栏

恶意代码定义

恶意代码也称为恶意软件，是对各种敌对和入侵软件的概括性术语。包括各种形式的计算机病毒、蠕虫、特洛伊木马、勒索软件、间谍软件、广告软件以及其他的恶意软件。

恶意代码的种类

计算机病毒：指寄居在计算机系统中，在一定条件下被执行会破坏系统、程序的功能和数据，影响系统其他程序和自我复制。

蠕虫：也算是一种病毒，它具有自我复制能力并通过计算和网络的负载，消耗有限资源。

特洛伊木马：也可以简称为木马，最初来源于古希腊传说。计算机木马是一种潜伏在计算机中为了达到某种特殊目的的程序，比如窃取用户私密信息和控制用户系统等。它与病毒最大的不同点在于，病毒能进行自我复制，而木马不具有复制功能，不会感染其他程序。

Rootkit：最初是指一组能帮助使用者获取系统权限的工具包，这里的是一种恶意程序，用于获取目标主机权限之后隐藏攻击者访问痕迹，使得攻击者不被发现从而能够长期拥有管理员权限。它具有很好的隐蔽性和潜伏性，难以检测。

恶意代码特征（区分程序恶意特征的特征信息）

系统调用特征
规范化代码特征
N-gram特征
控制流（CFG特征）
指令序列特征
文件格式等特征

恶意代码特征提取

Byte n-gram Features:从文件的二进制代码中提取Byte n-gram特征，其中选择训练集中每个类的L个最常出现的n克来表示类的配置文件。

Opcode n-gram Features:首先拆卸所有数据集的可执行文件和操作码提取。一个操作码的汇编语言指令描述要执行的操作。它是短形式的操作码。一条指令包含一个操作码和操作数,选择应该采取的操作。一些操作的操作数操作码可能操作,根据CPU体系结构,寄存器,值存储在内存和堆栈等等。一个操作码的作用在算术、逻辑运算和数据处理操作。操作码能够统计得出之间的可变性恶意和正版软件。

Portable Executables：这些特征是从EXE文件的某些部分提取出来的。利用可执行文件的结构信息，通过静态分析提取可执行文件的特征。这些有意义的特性表明文件被操纵或感染以执行恶意活动。

String Features：这些特征是基于纯文本编码在可执行文件，如windows, getversion, getstartupinfo, getmodulefilename, messagebox，库等。这些字符串是用PE和非PE可执行文件编码的连续可打印字符。

Function Based Features：在程序文件的运行时行为上提取基于函数的特征。基于函数的特性函数驻留在要执行的文件中，并利用它们生成表示文件的各种属性。

Hybrid Analysis Features：静态分析和动态分析的结合。

恶意代码检测

基于静态特征的恶意代码检测技术

分类特征	参考文献
The byte code	Kolter J Z, Maloof M A. Learning to detect malicious executables in the wild. [C]. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2004: 470-478. Santos I, Penya Y K, Devesa J, et al. N-grams-based File Signatures for Malware Detection. Proceedings of the 2009 International Conference on Enterprise Information Systems (ICEIS), 2009, 9: 317-320
n-grams
File format	Shafiq M Z, Tabish S M, Mirza F, et al. Pe-miner: Mining structural information to detect malicious executables in realtime. Recent advances in intrusion detection, Springer Berlin Heidelberg, 2009: 121-141. Bai J, Wang J, Zou G. A Malware Detection Scheme Based on Mining Format Information. The Scientific World Journal, 2014.
Gray image	Nataraj L, Karthikeyan S, Jacob G, et al. Malware images: visualization and automatic classification[C] . Proceedings of the 8th international symposium on visualization for cyber security. ACM, 2011: 4. HAN Xiao-guang, QU Wu, YAO Xuan-xia, et al. Research on malicious code variants detection based on texture fingerprint. Journal on Communications, 2014, 35(8):125-135.
Function call graph	Kong D, Yan G. Discriminant malware distance learning on structural information for automated malware classification[C]. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013: 1357-1365.

基于动态特征的恶意代码检测技术

分类特征	参考文献
Variable length	Nair V P, Jain H, Golecha Y K, et al. MEDUSA: MEtamorphic malware dynamic analysis usingsignature from API[C]. Proceedings of the 3rd International Conference on Security of Information and Networks. ACM, 2010: 263-269. Chen F, Fu Y. Dynamic detection of unknown malicious executables base on api interception[C]. Database Technology and Applications, 2009 First International Workshop on. IEEE, 2009: 329-332. Firdausi I, Lim C, Erwin A, et al. Analysis of machine learning techniques used in behavior-based malware detection[C]. Advances in Computing, Control and Telecommunication Technologies (ACT), 2010 Second International Conference on. IEEE, 2010: 201-203.
API	Nair V P, Jain H, Golecha Y K, et al. MEDUSA: MEtamorphic malware dynamic analysis usingsignature from API[C]. Proceedings of the 3rd International Conference on Security of Information and Networks. ACM, 2010: 263-269.
subsequences
Operation code	Shabtai A, Moskovitch R, Feher C, et al. Detecting unknown malicious code by applying classification techniques on opcode patterns. Security Informatics, 2012, 1(1): 1-22. [17] Pai S, Di Troia F, Visaggio C A, et al. Clustering for malware classification. Journal of Computer Virology and Hacking Techniques, 2016: 1-13.
n-grams
Graph	Bonfante G, Kaczmarek M, Marion J Y. Architecture of a morphological malware detector. Journal in Computer Virology, 2009, 5(3): 263-270. Cesare S, Xiang Y, Zhou W. Control flow-based malware variant detection. IEEE Transactions on Dependable and Secure Computing, 2014, 11(4): 307–317.

基于融合特征的恶意代码检测技术（各种集成特征类型的检测方法）

分类特征（动态特征/静态特征）	参考文献
Dynamic API operation code	SantosI, DevesaJ, Brezo F, et al. Opem: A static-dynamic approach for machine learning based malware detection[C]. International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions. Springer Berlin Heidelberg, 2013: 271-280
Program behavior Static DLL、API	Lu Y B, Din S C, Zheng C F, et al. Using multi-feature and classifier ensembles to improve malware detection. Journal of CCIT, 2010, 39(2): 57-72.
API call sequence PE format	Guo S, Yuan Q, Lin F, et al. A malware detection algorithm based on multi-view fusion. Neural Information Processing, Models and Applications, Springer Berlin Heidelberg, 2010: 259-266. Krawczyk B, Woźniak M. Evolutionary Cost-Sensitive Ensemble for Malware Detection. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14, Springer International Publishing, 2014: 433-442.
Dynamic API Static API	Ozdemir M, Sogukpinar I. An Android Malware Detection Architecture based on Ensemble Learning. Transactions on Machine Learning and Artificial Intelligence, 2014, 2(3): 90-106.
operation code byte code	Bai, Jinrong, and Junfeng Wang. Improving malware detection using multiview ensemble learning. Security and Communication Networks 9.17 (2016): 4227-4241.