静态扫描之Yara第二话--编写yara规则（1）

最新推荐文章于 2024-08-09 07:50:59 发布

G4rb3n

最新推荐文章于 2024-08-09 07:50:59 发布

阅读量3.6k

点赞数 4

分类专栏：恶意软件分析

恶意软件分析专栏收录该内容

40 篇文章 49 订阅

订阅专栏

编写简单高效的yara规则（1）

翻译自：https://www.bsk-consulting.de/2015/02/16/write-simple-sound-yara-rules/

在过去的两年里，我根据IOC抓取到的样本，编写了约2000条Yara规则。许多安全专家发现，Yara提供了一种简单有效的方法，可以根据样本中的字符串或字节序列编写自定义规则，这使得广大用户可以创建属于自己的检测工具。

然而，让我不满意的是，研究人员发表的yara规则存在两大不足：

产生许多误报
只能识别单一的样本，这样的话还不如用hash值识别

因此，我决定写一篇关于如何构建最佳Yara规则的文章，这些规则可以用来扫描上传到沙箱的单个样本以及整个文件系统，而且误报几率很小。

这些规则是基于特征字符串的，易于理解。您不需要了解PE的逆向工程，我决定避免使用“pe”这样的新的Yara模块，我认为这些模块在实践中可能会导致内存泄漏或其他错误。

自动化生成yara规则

首先，我相信自动生成的规则永远比不上手动创建的规则。在 IOC scanners THOR和LOKI的工作期间，我不得不手动创建数百个Yara规则，很明显，是个繁琐的工作。我曾经的方法是通过以下命令从我的样本中提取UNICODE和ASCII字符串：

strings -el samples.exe
strings -a sample.exe

我更喜欢UNICODE字符串，因为它们经常被忽略，并且在某个恶意软件家族中更改的频率较低。确保在规则中使用带有“wide”关键字的UNICODE字符串和带有“ascii”关键字的ASCII字符串，如果要全匹配，则使用“fullword”。
这种方法的问题是，我们不能保证其中的特征字符串是唯一的，并且这些字符串可能出现在合法的软件中。

在下面的示例中查看提取的字符串：

NTLMSSP
%d.%d.%d.%d
%s\IPC$
\\%s
NT LM 0.12
%s%s%s
%s.exe %s
%s\Admin$\%s.exe
RtlUpcaseUnicodeStringToOemString
LoadLibrary( NTDLL.DLL ) Error:%d

你能确定字符串“NT LM 0.12”是这个恶意软件特有的，不会出现在合法的软件中吗？

为了解决这个问题，我开发了yarGen，一个Yara规则生成器，附带一个大型合法软件的良性字符串库。我使用Windows 2003，Windows 7和Windows 2008 R2服务器的Windows系统文件夹文件，Microsoft Office，7zip，Firefox，Chrome，Cygwin和各种杀毒软件文件夹等合法软件生成良性字符串库。 yarGen允许您生成自己的良性字符串库或添加更多合法软件的文件夹到现有的良性字符串库。

yarGen从恶意样本中提取所有ASCII和UNICODE字符串，并删除所有也出现在良性字符串数据库中的字符串。然后使用模糊正则表达式和“Gibberish Detector”来评估和评分每个字符串，这使得yarGen能够选出最优的特征字符串。这些字符串的前20位将被整合到最终的规则中。

我们来看看两个例子。 Enfal Trojan和SMB蠕虫样本的示例。

yarGen从Enfal木马样本中提取出以下规则：

rule Enfal_Generic {
meta:
description = "Auto-generated rule - from 3 different files"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$s0 = "urlmon" fullword
$s1 = "Registered trademarks and service marks are the property of their respec" wide
$s2 = "Micorsoft Corportation" fullword wide
$s3 = "IM Monnitor Service" fullword wide
$s4 = "imemonsvc.dll" fullword wide
$s5 = "iphlpsvc.tmp" fullword
$s6 = "XpsUnregisterServer" fullword
$s7 = "XpsRegisterServer" fullword
$s8 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$s9 = "tEHt;HuD" fullword
$s10 = "6.0.4.1624" fullword wide
$s11 = "#*8;-&gt;)" fullword
$s12 = "%/&gt;#?#*8" fullword
$s13 = "\\%04x%04x\" fullword
$s14 = "3,8,18" fullword
$s15 = "3,4,15" fullword
$s16 = "3,7,12" fullword
$s17 = "3,4,13" fullword
$s18 = "3,8,12" fullword
$s19 = "3,8,15" fullword
$s20 = "3,6,12" fullword
condition:
all of them
}

生成的字符串集合包含许多有用的字符串，但也包含随机的ASCII字符（ $s9，$ s11，$ s12），它们可以匹配当前的样本，但匹配不了其他相似的恶意样本（如同一个家族的）。

yarGen从SMB蠕虫样本中提取以下规则：

rule sig_smb {
meta:
description = "Auto-generated rule - file smb.exe"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$s0 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$s1 = "SetServiceStatus failed, error code = %d" fullword ascii
$s2 = "%s\\Admin$\\%s.exe" fullword ascii
$s3 = "%s.exe %s" fullword ascii
$s4 = "iloveyou" fullword ascii
$s5 = "Microsoft@ Windows@ Operating System" fullword wide
$s6 = "\\svchost.exe" fullword ascii
$s7 = "secret" fullword ascii
$s8 = "SVCH0ST.EXE" fullword wide
$s9 = "msvcrt.bat" fullword ascii
$s10 = "Hello123" fullword ascii
$s11 = "princess" fullword ascii
$s12 = "Password123" fullword ascii
$s13 = "Password1" fullword ascii
$s14 = "config.dat" fullword ascii
$s15 = "sunshine" fullword ascii
$s16 = "password &lt;=14" fullword ascii
$s17 = "del /a %1" fullword ascii
$s18 = "del /a %0" fullword ascii
$s19 = "result.dat" fullword ascii
$s20 = "training" fullword ascii
condition:
all of them
}

以上规则算是合格的yara规则，但它们远非最佳的yara规则。，尽管这些yara规则不会匹配上合法软件。

如果你不想使用或下载yarGen，你也可以使用由Joe Security提供的在线工具Yara Rule Generator，它也是基于yarGen的。

接下来，我们来看看如何生成更高效更通用的yara规则。

生成高效通用的yara规则

正如我在导言中所说的产生误报的规则相当烦人。然而，真正的悲剧是大多数规则太具体，不能匹配多个样本，因此效果和hash值匹配一样效果。

于是我将这些字符串进行分类：


 1. Very specific strings：单个恶意样本特有的
 2. Rare strings：可能不会出现在合法软件中，但也有可能出现
 3. Strings that look common：通用型，不会出现在合法软件中的

观察一下规则以便更好地理解。忽略名为$ mz的定义，稍后我会解释它。

以$ s开头的是specific字符串，我认为这些字符串非常特殊，不会出现在合法的软件中。请注意两个字符串中的拼写错误：“Micorsoft Corportation”而不是“Microsoft Corporation”和“Monnitor”，而不是“Monitor”。

以$ x开头的字符串是rare字符串，它们可能会出现在合法软件。

以$ z开头的是general字符串，能通用地匹配恶意软件，不会出现在合法软件中。

rule Enfal_Malware_Backdoor {
meta:
description = "Generic Rule to detect the Enfal Malware"
author = "Florian Roth"
date = "2015/02/10"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$mz = { 4d 5a }

$s1 = "Micorsoft Corportation" fullword wide
$s2 = "IM Monnitor Service" fullword wide

$x1 = "imemonsvc.dll" fullword wide
$x2 = "iphlpsvc.tmp" fullword
$x3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword

$z1 = "urlmon" fullword
$z2 = "Registered trademarks and service marks are the property of their" wide
$z3 = "XpsUnregisterServer" fullword
$z4 = "XpsRegisterServer" fullword
condition:
( $mz at 0 ) and
(
( 1 of ($s*) ) or
( 2 of ($x*) and all of ($z*) )
)
and filesize < 40000
}

现在来看条件语句，注意我们使用$mz来定义扫描PE文件，避免如防病毒签名文件，浏览器缓存或字典文件等误报。加上filesize来给扫描样本加上大小限制，达到更精确的效果

我规定了当目标文件只要存在一个specific字符串，就触发此规则（1 of $s*）

当出现若干个rare字符串且出现全部genernal字符串时，触发此规则(2 of $x* and all of $z*)

接下来我们看第二个例子：

rule SMB_Worm_Tool_Generic {
meta:
description = "Generic SMB Worm/Malware Signature"
author = "Florian Roth"
reference = "http://goo.gl/N3zx1m"
date = "2015/02/08"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$mz = { 4d 5a }

$s1 = "%s\\Admin$\\%s.exe" fullword ascii
$s2 = "SVCH0ST.EXE" fullword wide

$a1 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$a2 = "\\svchost.exe" fullword ascii
$a3 = "msvcrt.bat" fullword ascii
$a4 = "Microsoft@ Windows@ Operating System" fullword wide

$x1 = "%s.exe %s" fullword ascii
$x2 = "password &lt;=14" fullword ascii
$x3 = "del /a %1" fullword ascii
$x4 = "del /a %0" fullword ascii
$x5 = "SetServiceStatus failed, error code = %d" fullword ascii

$z1 = "secret" fullword ascii
$z2 = "Hello123" fullword ascii
$z3 = "princess" fullword ascii
$z4 = "Password123" fullword ascii
$z5 = "Password1" fullword ascii
$z6 = "sunshine" fullword ascii
$z7 = "training" fullword ascii
$z8 = "iloveyou" fullword ascii
condition:
$mz at 0 and
( 1 of ($s*) and 1 of ($x*) ) or
( all of ($a*) and 2 of ($x*) ) or
( 5 of ($z*) and 2 of ($x*) ) and
filesize < 200000
}

$s*为specific字符串（如SVCH0ST.EXE，”O”被替换为”0”，这可能是当前样本才有的特征），
$a*为rare字符串，这些字符串也有可能出现在合法软件中，
$x*为general字符串，是恶意软件通用的特征，不会匹配到合法软件，
$z*为自定义的密码类字符串，一般是暴力破解类恶意软件才会拥有这种字符串，我们也将其归为一类，
最后，我们通过判断各类字符串的权重优化条件。