IRMA源码解析（二）

最新推荐文章于 2020-08-20 11:34:12 发布

vspiders

最新推荐文章于 2020-08-20 11:34:12 发布

阅读量310

点赞数

分类专栏： Python 文章标签： irma python

本文链接：https://blog.csdn.net/vspiders/article/details/103768542

版权

Python 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

这次来分析下IRMA引擎扫描过程，IRMA支持的扫描模式一般为

1. 通过命令行调用杀毒引擎

2. 扫描文件

3. 解析扫描结果，并输出报告

这里IRMA提供了针对该类型的通用型模板。

先从主扫描入口类AntivirusPluginInterface开始分析

class AntivirusPluginInterface(object):
    """ Antivirus Plugin Base Class
        Abstract class, should not be instanciated directly"""

    def __init__(self):
        self.module = self.module_cls()

    def run(self, paths):
        assert self.module
        if isinstance(paths, (tuple, list, set)):
            raise NotImplementedError(
                "Scanning of multiple paths at once is not supported for now")
        fpath = Path(paths)

        results = PluginResult(name=type(self).plugin_display_name,
                               type=type(self).plugin_category,
                               version=self.module.version)
        try:
            # add database metadata
            results.database = None
            if self.module.database:
                results.database = {str(fp): self.file_metadata(fp)
                                    for fp in self.module.database}
            # launch an antivirus scan, automatically append scan results
            started = timestamp(datetime.utcnow())
            results.status = self.module.scan(fpath)
            stopped = timestamp(datetime.utcnow())
            results.duration = stopped - started

            return_results = self.module.scan_results[fpath]
            # add scan results or append error
            if results.status < 0:
                results.error = return_results
            else:
                results.results = return_results

            # Add virus_database_version metadata
            results.virus_database_version = self.module.virus_database_version
        except Exception as e:
            results.status = -1
            results.error = type(e).__name__ + " : " + str(e)
        return results

可以接着（一）分析，首先查找并注册所有的plugin，然后每个plugin会继承AntivirusPluginInterface类，该类会通过init进行module初始化，这是一个动态调用的过程。最后进入扫描核心函数run()中，run()函数的处理过程为，先获取样本路径，之后初始化扫描结果，调用module.scan函数进行扫描，然后取出scan_results并返回扫描结果。

因此每个插件的核心就在于scan的调用，下面开始分析module代码。

所有的扫描插件都会继承AntivirusUnix或AntivirusWindows，而两者均从基类Antivirus延伸而来，先来看一下Antivirus。

class Antivirus(object, metaclass=EarlyInitializer):
    """ Antivirus Base Class
        Abstract class, should not be instanciated directly"""

    # List of attributes to initialize by calling the getter on the sub-class
    # cf. __getattr__
    _attributes = {
        # attr → default value
        "name": "unavailable",
        "database": [],
        "scan_args": (),
        "scan_path": None,
        "scan_patterns": [],
        "version": "unavailable",
        "virus_database_version": "unavailable",
    }

    # ===========
    #  Constants
    # ===========

    class ScanResult(object):
        CLEAN = 0
        INFECTED = 1
        ERROR = -1

    # ==================================
    #  Constructor and destructor stuff
    # ==================================

    def __init__(self, *args, **kwargs):
        # scan tool variables
        self._scan_retcodes = {
            self.ScanResult.CLEAN: lambda x: x in [0],
            self.ScanResult.INFECTED: lambda x: x in [1],
            self.ScanResult.ERROR: lambda x:
                not self._scan_retcodes[self.ScanResult.CLEAN](x) and
                not self._scan_retcodes[self.ScanResult.INFECTED](x),
        }
        # scan pattern-matching
        self.scan_results = {}
        self._is_windows = sys.platform.startswith('win')

    # ====================
    #  Antivirus methods
    # ====================

    # TODO: enable multiple paths
    def scan(self, paths, env=None):
        if isinstance(paths, (tuple, list, set)):
            raise NotImplementedError(
                "Scanning of multiple paths at once is not supported for now")

        # Artifice for python <3.5 compatibility
        args = list(self.scan_args)
        args.append(paths)

        results = self.run_cmd(self.scan_path, *args, env=env)
        return self.check_scan_results(paths, results)

    @staticmethod
    def run_cmd(*cmd, env=None):
        """ Run a command
            :param cmd: The command to run. Either
                a string: eg. "ls -la /tmp"
                a sequence: eg. ["ls", "-la", Path("/tmp")]
                multiple arguments: "ls", "-la", Path("/tmp")
            :returns: the tuple (retcode, stdout, stderr) of the process. Both
                stdout and stderr are strings (unencoded data).
        """
        assert cmd

        if len(cmd) > 1:
            # case: multiple arguments
            cmd = list(Antivirus.sanitize(cmd))
        else:
            # Artifice for python <3.5 compatibility
            # cmd is necessarily a tuple of 1 argument
            unpckd_cmd = cmd[0]
            if isinstance(unpckd_cmd, Path):  # case: a Path
                cmd = list(Antivirus.sanitize(cmd))
            elif isinstance(unpckd_cmd, str):  # case: a string
                cmd = unpckd_cmd.split()
            else:  # last case: a sequence
                cmd = list(Antivirus.sanitize(unpckd_cmd))

        # execute command with popen, clean up outputs
        pd = Popen(cmd, stdout=PIPE, stderr=PIPE, env=env)

        stdout, stderr = (x.strip().decode() for x in pd.communicate())
        results = pd.returncode, stdout, stderr

        log.debug("Executed command line {},\n got {}".format(cmd, results))
        return results

    def identify_threat(self, filename, out):
        for pattern in self.scan_patterns:
            for match in pattern.finditer(out):
                threat_path = Path(match.group('file').strip())
                # Some threat possibilities:
                #   /absolute-dir/.../filename
                #   /absolute-dir/.../name.zip/unzip1.zip/unzip2.zip
                #   relative-dir/.../name.zip/unzip1.zip/unzip2.zip

                if filename == threat_path or filename in threat_path.parents:
                    threat = match.group('name').strip()
                    if threat:
                        return threat

    def check_scan_results(self, fpath, results):
        log.debug("scan results for {0}: {1}".format(fpath, results))
        CLEAN = self.ScanResult.CLEAN
        INFECTED = self.ScanResult.INFECTED
        ERROR = self.ScanResult.ERROR

        retcode, stdout, stderr = results
        self.scan_results = {}

        # 1/ get meaning of retcode
        if self._scan_retcodes[INFECTED](retcode):
            retcode = INFECTED
        elif self._scan_retcodes[ERROR](retcode):
            retcode = ERROR
            log.error("command line returned {}: {}".format(
                retcode, (stdout, stderr)))
        elif self._scan_retcodes[CLEAN](retcode):
            retcode = CLEAN
        else:
            raise RuntimeError(
                "unhandled return code {} in class {}: {}".format(
                    retcode, type(self).__name__, results))

        # 2/ handle the retcode
        if retcode == INFECTED:
            threat = self.identify_threat(fpath, stdout)
            if threat:
                self.scan_results[fpath] = threat
            else:
                retcode = ERROR if stderr else CLEAN
        if retcode == ERROR:
            self.scan_results[fpath] = stderr
        elif retcode == CLEAN:
            self.scan_results[fpath] = None

        return retcode

模板中的scan函数就是获取执行命令，调用run_cmd执行并返回结果。结果的格式为

pd.returncode, stdout, stderr

后面调用check_scan_results针对结果进行解析，解析过程用到了lambda函数，大概意思为，每个结果对应一个lambda函数，将retcode以参数的形式传入该lambda函数，并进行判断其是否满足。其中INFECTED为恶意，ERROR为错误，CLEAN为正常，档retcode为INFECTED时，还会进行identify_threat判断，即设置的scan_patterns对象能够匹配到file以及name两个特征。scan_patterns示例如下：

        # output : filepath,malware,info
        self.scan_patterns = [
            re.compile('(?P<file>\S+),malware,(?P<name>.*)$',
                       re.IGNORECASE | re.MULTILINE),
        ]

如果存在threat，那么以文件路径为key生成字典，

self.scan_results[fpath] = threat

否则结果更新为None或者stderr，然后返回状态码retcode，此时self.scan_results已经包含了扫描的结果信息，后面会对其进行返回并输出。

事实上，这些函数都是可以进行重写的，了解了这个扫描过程之后，再写一个modue模块并不麻烦，简单写一个示例如下：

class TestSecurity(AntivirusUnix):
    name = "Avast Core Security (Linux)"

    # ==================================
    #  Constructor and destructor stuff
    # ==================================

    def __init__(self, *args, **kwargs):
        # class super class constructor
        super().__init__(*args, **kwargs)
        # scan tool variables

    def scan(self, path):
        self.scan_results[path]="test"
        return 1

    def get_version(self):
        """return the version of the antivirus"""
        return 1.0

    def get_database(self):
        return 1.0

    def get_scan_path(self):
        """return the full path of the scan tool"""
        return self.locate_one("test")

    def get_virus_database_version(self):


        return 1.0