yara规则学习与使用

最新推荐文章于 2024-12-28 09:27:10 发布

努力学习的大康

最新推荐文章于 2024-12-28 09:27:10 发布

阅读量7.5k

点赞数 5

分类专栏：逆向分析恶意代码分析文章标签：安全

本文链接：https://blog.csdn.net/abel_big_xu/article/details/125381650

版权

逆向分析同时被 2 个专栏收录

24 篇文章

订阅专栏

恶意代码分析

11 篇文章

订阅专栏

最近需要对分析的病毒提供一定的检测能力。看了一圈发现yara规则比较满足我的需求。
本文包括：

yara规则的简单介绍
yara规则的编写（字符串定义和条件定义）（基本就是官网翻译了）
如何在python语言中使用yara（简单使用）

一、简介&安装

简介：vt开发的一个用于编写恶意软件识别和分类规则的工具。

官方的github库地址：https://github.com/VirusTotal/yara/releases

官方文档说明：https://yara.readthedocs.io

简单示例：

rule silent_banker : banker
{
    meta:
        description = "This is just an example"
        threat_level = 3
        in_the_wild = true
    strings:
        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
        $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
    condition:
        $a or $b or $c
}

安装：下载即可使用
使用：yara.exe rule.yara 待检测文件或目录

二、yara规则编写

一般规则分为：字符串和条件两个部分。
字符串定义软件中可能出现的字符串。
条件将字符串出现进行组合更好的筛选程序。

//两种简单的字符串形式
rule ExampleRule
{
    strings:
        $my_text_string = "text here"
        $my_hex_string = { E2 34 A1 C8 23 FB }

    condition:
        $my_text_string or $my_hex_string
}

2.1关键字

关键字规则与c语言类似

all	and	any	ascii	at	base64	base64wide	condition
contains	endswith	entrypoint	false	filesize	for	fullword	global
import	icontains	iendswith	iequals	in	include	int16	int16be
int32	int32be	int8	int8be	istartswith	matches	meta	nocase
none	not	of	or	private	rule	startswith	strings
them	true	uint16	uint16be	uint32	uint32be	uint8	uint8be
wide	xor	defined

2.2 字符串定义（string）

字符串以$开头，使用数字、下划线、字符串进行命名。可以使用”或者{}进行字符串的定义

$my_hex_string = { E2 34 A1 C8 23 FB }
$hex_string = { E2 34 ?? C8 A? FB }//？为通配符
$hex_string = { F4 23 [4-6] 62 B4 }//任意填充4-6个字节
$hex_string = { F4 23 ( 62 B4 | 56 ) 45 }//63 B4或者56选择其中一个
$my_text_string = "text here\" \\ \r \t \n \xdd"//和c语言中的字符串定义一样

字符串的修饰：在定义了字符串后可以用一些修饰词对其进行修饰，并且支持同时使用多个修饰词，如nocase表示忽略大小写

$text_string = "foobar" nocase//忽略大小写，可以匹配Foobar, FOOBAR, and fOoBaR
$wide_string = "Borland" wide//表示匹配宽字节，B\x00p\x00这种
$wide_and_ascii_string = "Borland" wide ascii//可以同事匹配wide或者ascii
$xor_string = "This program cannot" xor//可以发现异或后的字符串
$a = "This program cannot" base64//发现base64加密的字符串
$a = "This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")
//支持自定义base64的表
fullword关键字，需要整个词匹配。如domain不能匹配www.mydomain.com，匹配www.my-domain.com和

关键词的组合限制

关键词	作用	限制，无法同时使用
nocase	忽略大小写	xor base64 base64wide
wide	宽字节UTF16
ascii	匹配ascii字符
xor	单字节异或	nocase base64 base64wide
base64	匹配base64后的	nocase xor fullword
base64wide	匹配base64后的交错0x00的字符串	nocase xor fullword
fullword	严格匹配完整字符	base64 base64wide

正则表达式方式：使用/和/将正则内容包裹（https://www.runoob.com/regexp/regexp-tutorial.html 正则学习）

$re1 = /md5: [0-9a-fA-F]{32}/
$re2 = /state: (on|off)/
$re1 = /foo/i    // 大小不敏感
$re2 = /bar./s   // In this regexp the dot matches everything, including new-line
$re3 = /baz./is  // Both modifiers can be used together

正则表达式特殊字符学习

符号	含义
\	匹配一个字符。\，\|，*等
^	匹配开头
$	匹配结尾
.	匹配任意单个字符

()	匹配括号里的内容
[]	匹配【】里的任意内容
*	匹配0或多次
+	至少匹配一次
?	匹配0或1次
{n}	匹配n次
{n,}	至少匹配n次
{,m}	最多匹配m次
{n,m}	匹配n到m次
\t	tab
\n	换行
\r	回车
\xNN	某个字符
\w	匹配一个单词(数字，字母，下划线)
\W	匹配非单词
\s	匹配一个空白字符
\S	匹配非空白字符
\d	匹配数字
\D	匹配非数字
\b	单词边界
\B	非单词边界

2.3 条件定义（condition）

条件定义与编程的布尔表达式基本一致

布尔类型：and、or、not
关系运算：>=、<=、<、>、==、!=
算术运算：+、-、*、、\、%
位运算：&、|、<<、>>、~、^

井号（#）表示统计出现次数

rule CountExample
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        #a == 6 and #b > 10
		#a in (filesize-500..filesize) == 2 //可以范围统计
}

at表示偏移或虚拟地址

rule AtExample
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        $a at 100 and $b at 200//$a出现在100偏移
}

in表示范围寻找

rule InExample
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        $a in (0..100) and $b in (100..filesize)
}

关键词filesize表示文件大小，表示文件大于200kb，只对文件时生效

rule FileSizeExample
{
    condition:
        filesize > 200KB
}

关键词entrypoint表示程序的入口点，常用于查看是否为壳或是否感染

rule EntryPointExample1
{
    strings:
        $a = { E8 00 00 00 00 }

    condition:
        $a at entrypoint
}

rule EntryPointExample2
{
    strings:
        $a = { 9C 50 66 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85 }

    condition:
        $a in (entrypoint..entrypoint + 10)
}

从文件或内存偏移获取数据

int8(<offset or virtual address>)
int16(<offset or virtual address>)
int32(<offset or virtual address>)

uint8(<offset or virtual address>)
uint16(<offset or virtual address>)
uint32(<offset or virtual address>)

int8be(<offset or virtual address>)
int16be(<offset or virtual address>)
int32be(<offset or virtual address>)

uint8be(<offset or virtual address>)
uint16be(<offset or virtual address>)
uint32be(<offset or virtual address>)

rule IsPE
{
    condition:
        // MZ signature at offset 0 and ...
        uint16(0) == 0x5A4D and
        // ... PE signature at offset stored in MZ header at 0x3C
        uint32(uint32(0x3C)) == 0x00004550
}

字符串集合：可以使用括号，或者通配符*来表示，所有字符串可以使用them

rule OfExample1
{
    strings:
        $a = "dummy1"
        $b = "dummy2"
        $c = "dummy3"
				$foo1 = "foo1"
        $foo2 = "foo2"
        $foo3 = "foo3"

    condition:
        2 of ($a,$b,$c)
				2 of ($foo*)  // equivalent to 2 of ($foo1,$foo2,$foo3)
				1 of them // equivalent to 1 of ($*)
}

all of them       // all strings in the rule
any of them       // any string in the rule
all of ($a*)      // all strings whose identifier starts by $a
any of ($a,$b,$c) // any of $a, $b or $c
1 of ($*)         // same that "any of them"
none of ($b*)     // zero of the set of strings that start with "$b"

针对字符串的遍历,#表示出现次数，@表示第一个偏移量，!表示字符串长度

for all of them : ( # > 3 )
for all of ($a*) : ( @ > @b )

迭代遍历

for any section in pe.sections : ( section.name == ".text" )
for any i in (0..pe.number_of_sections-1) : ( pe.sections[i].name == ".text" )
for any k,v in some_dict : ( k == "foo" and v == "bar" )
for <quantifier> <variables> in <iterable> : ( <some condition using the loop variables> )

参考其他规则，可以直接复用其他规则

rule Rule1
{
    strings:
        $a = "dummy1"

    condition:
        $a
}

rule Rule2
{
    strings:
        $a = "dummy2"

    condition:
        $a and Rule1
}

2.4 其他语法

全局规则（global）：所有其他规则都会带上全局规则限制

global rule SizeLimit
{
    condition:
        filesize < 2MB
}

私有规则：不会有检测输出，作为其他规则的配套规则

private rule PrivateRuleExample
{
    ...
}

Metadata：存放规则的相关信息

rule MetadataExample
{
    meta:
        my_identifier_1 = "Some string data"
        my_identifier_2 = 24
        my_identifier_3 = true

    strings:
        $my_text_string = "text here"
        $my_hex_string = { E2 34 A1 C8 23 FB }

    condition:
        $my_text_string or $my_hex_string
}

引入第三方的库

import "pe"
import "cuckoo"

rule Test
{
    strings:
        $a = "some string"

    condition:
        $a and pe.entry_point == 0x1000
}

引入其他的yara文件

include "other.yar"
include "./includes/other.yar"
include "../includes/other.yar"

三、在python中使用yara规则

安装yara-python库

pip install yara-python

简单demo

import yara
import os

# 获取目录内的yara规则文件
# 将yara规则编译
def getRules(path):
    filepath = {}
    for index, file in enumerate(os.listdir(path)):
        rupath = os.path.join(path, file)
        key = "rule" + str(index)
        filepath[key] = rupath
    yararule = yara.compile(filepaths=filepath)
    return yararule

# 扫描函数
def scan(rule, path):
    for file in os.listdir(path.decode("utf-8")):
        mapath = os.path.join(path, file)
        print malpath
        fp = open(mapath, 'rb')
        matches = rule.match(data=fp.read())
        if len(matches) > 0:
            print file, matches

if __name__ == '__main__':
    rulepath = "/home/authenticate/yara/rule_yara/"   # yara规则目录
    malpath ="/home/authenticate/yara/test_simple/" # simple目录
    # yara规则编译函数调用
    yararule = getRules(rulepath)
    # 扫描函数调用
    scan(yararule, malpath)

四、总结

规则编写主要分为字符串编写和条件编写难度都不大，但是如何能够写出准确、通用性好、误报少的还是挺难的，需要多写写和想象力。
参考：
官方的github库地址：https://github.com/VirusTotal/yara/releases
官方文档说明：https://yara.readthedocs.io
python中使用yara的demo： https://blog.csdn.net/weixin_40596016/article/details/79865670