用于文件监视的文件名无效:_监视文件完整性

最新推荐文章于 2021-03-18 14:54:14 发布

culi3182

最新推荐文章于 2021-03-18 14:54:14 发布

阅读量401

点赞数

文章标签：算法数据库 python java php

原文链接：https://www.sitepoint.com/monitoring-file-integrity/

版权

用于文件监视的文件名无效:

Ask yourself how you might address the following circumstances when managing a website:

问问自己在管理网站时如何解决以下情况：

A file is unintentionally added, modified or deleted
意外添加，修改或删除文件
A file is maliciously added, modified or deleted
文件被恶意添加，修改或删除
A file becomes corrupted
文件损坏

More importantly, would you even know if one of these circumstances occurred? If your answer is no, then keep reading. In this guide I will demonstrate how to create a profile of your file structure which can be used to monitor the integrity of your files.

更重要的是，您是否知道这些情况之一是否发生过？如果您的答案为否，请继续阅读。在本指南中，我将演示如何创建文件结构的配置文件，该配置文件可用于监视文件的完整性。

The best way to determine whether or not a file has been altered is to hash its contents. PHP has several hashing functions available, but for this project I’ve decided to use the hash_file() function. It provides a wide range of different hashing algorithms which will make my code easy to modify at a later time should I decide to make a change.

确定文件是否已更改的最佳方法是散列其内容。 PHP有几个可用的哈希函数，但是对于该项目，我决定使用hash_file()函数。它提供了各种不同的哈希算法，这些哈希算法使我的代码在以后决定更改时易于修改。

Hashing is used in a wide variety of applications, everything from password protection to DNA sequencing. A hashing algorithm works by transforming a data into a fixed-sized, repeatable cryptographic string. They are designed so that even a slight modification to the data should produce a very different result. When two or more different pieces of data produce the same result string, it’s referred to as a “collision.” The strength of each hashing algorithm can be measured by both its speed and the probability of collisions.

散列用于各种各样的应用程序，从密码保护到DNA测序，应有尽有。哈希算法通过将数据转换为固定大小的可重复加密字符串来工作。它们的设计使得即使对数据进行很小的修改也应产生非常不同的结果。当两个或更多不同的数据产生相同的结果字符串时，则称为“冲突”。每个散列算法的强度都可以通过其速度和冲突概率来衡量。

In my examples I will be using the SHA-1 algorithm because it’s fast, the probability for collisions is low and it has been widely used and well tested. Of course, you’re welcome to research other algorithms and use any one you like.

在我的示例中，我将使用SHA-1算法，因为它速度快，发生碰撞的可能性低，并且已被广泛使用并经过良好测试。当然，欢迎您研究其他算法并使用任何您喜欢的算法。

Once the file’s hash has been obtained, it can be stored for later comparison. If hashing the file later doesn’t return the same hash string as before then we know the file has somehow been changed.

一旦获得了文件的哈希，就可以将其存储以供以后比较。如果以后对文件进行哈希处理后没有返回与以前相同的哈希字符串，那么我们知道文件已经过某种更改。

数据库 (Database)

To begin, we first need to layout a basic table to store the hashes of our files. I will be using the following schema:

首先，我们首先需要布局一个基本表来存储文件的哈希值。我将使用以下架构：

CREATE TABLE integrity_hashes (
    file_path VARCHAR(200) NOT NULL,
    file_hash CHAR(40) NOT NULL,
    PRIMARY KEY (file_path)
);

file_path stores the location of a file on the server and, since the value will always be unique because two files cannot occupy the same location in the file system, is our primary. I have specified its maximum length as 200 characters which should allow for some lengthy file paths. file_hash stores the hash value of a file, which will be a SHA-1 40-character hexadecimal string.

file_path存储文件在服务器上的位置，并且由于该值始终是唯一的(因为两个文件不能在文件系统中占据相同的位置)，因此它是我们的主要文件。我已将其最大长度指定为200个字符，这应该允许一些冗长的文件路径。 file_hash存储文件的哈希值，它将是SHA-1 40个字符的十六进制字符串。

收集档案 (Collecting Files)

The next step is to build a profile of the file structure. We define the path of where we want to start collecting files and recursively iterate through each directory until we’ve covered the entire branch of the file system, and optionally exclude certain directories or file extensions. We collect the hashes we need as we’re traversing the file tree which are then stored in the database or used for comparison.

下一步是建立文件结构的配置文件。我们定义了要开始收集文件的路径，并递归地遍历每个目录，直到覆盖了文件系统的整个分支为止，还可以选择排除某些目录或文件扩展名。我们遍历文件树时会收集所需的哈希值，然后将这些哈希值存储在数据库中或用于比较。

PHP offers several ways to navigate the file tree; for simplicity, I’ll be using the RecursiveDirectoryIterator class.

PHP提供了几种导航文件树的方式。为了简单起见，我将使用RecursiveDirectoryIterator类。

<?php
define("PATH", "/var/www/");
$files = array();

// extensions to fetch, an empty array will return all extensions
$ext = array("php");

// directories to ignore, an empty array will check all directories
$skip = array("logs", "logs/traffic");

// build profile
$dir = new RecursiveDirectoryIterator(PATH);
$iter = new RecursiveIteratorIterator($dir);
while ($iter->valid()) {
    // skip unwanted directories
    if (!$iter->isDot() && !in_array($iter->getSubPath(), $skip)) {
        // get specific file extensions
        if (!empty($ext)) {
            // PHP 5.3.4: if (in_array($iter->getExtension(), $ext)) {
            if (in_array(pathinfo($iter->key(), PATHINFO_EXTENSION), $ext)) {
                $files[$iter->key()] = hash_file("sha1", $iter->key());
            }
        }
        else {
            // ignore file extensions
            $files[$iter->key()] = hash_file("sha1", $iter->key());
        }
    }
    $iter->next();
}

Notice how I referenced the same folder logs twice in the $skip array. Just because I choose to ignore a specific directory doesn’t mean that the iterator will also ignore all of the sub-directories, which can be useful or annoying depending on your needs.

请注意，我是如何在$skip数组中两次引用同一文件夹logs 。仅仅因为我选择忽略特定目录并不意味着迭代器也将忽略所有子目录，这取决于您的需求可能有用或令人讨厌。

The RecursiveDirectoryIterator class gives us access to several methods:

RecursiveDirectoryIterator类使我们可以访问几种方法：

valid() checks whether or not we’re working with a valid file
valid()检查我们是否正在使用有效文件
isDot() determines if the directory is “.” or “..“
isDot()确定目录是否为“” . ”或“ .. ”
getSubPath() returns the folder name in which the file pointer is currently located
getSubPath()返回文件指针当前所在的文件夹名称
key() returns the full path and file name
key()返回完整路径和文件名
next() starts the loop over again
next()开始循环

There are also several more methods available to work with, but mostly the ones listed above are really all we need for the task at hand, although the getExtension() method has been added in PHP 5.3.4 which returns the file extension. If your version of PHP supports it, you can use it to filter out unwanted entries rather than what I did using pathinfo().

还有其他几种方法可以使用，但是上面列出的大多数方法实际上是我们手头任务所需的全部方法，尽管PHP 5.3.4中已添加getExtension()方法来返回文件扩展名。如果您PHP版本支持它，则可以使用它来过滤掉不需要的条目，而不是我使用pathinfo()做的事情。

When executed, the code should populate the $files array with results similar to the following:

执行后，代码应使用类似以下结果填充$files数组：

Array
(
    [/var/www/test.php] => b6b7c28e513dac784925665b54088045cf9cbcd3
    [/var/www/sub/hello.php] => a5d5b61aa8a61b7d9d765e1daf971a9a578f1cfa
    [/var/www/sub/world.php] => da39a3ee5e6b4b0d3255bfef95601890afd80709
)

Once we have the profile built, updating the database is easy peasy lemon squeezy.

一旦我们建立了配置文件，就可以轻松地更新数据库。

<?php
$db = new PDO("mysql:host=" . DB_HOST . ";dbname=" . DB_NAME,
    DB_USER, DB_PASSWORD);

// clear old records
$db->query("TRUNCATE integrity_hashes");

// insert updated records
$sql = "INSERT INTO integrity_hashes (file_path, file_hash) VALUES (:path, :hash)";
$sth = $db->prepare($sql);
$sth->bindParam(":path", $path);
$sth->bindParam(":hash", $hash);
foreach ($files as $path => $hash) {
    $sth->execute();
}

检查差异 (Checking For Discrepancies)

You now know how to build a fresh profile of the directory structure and how to update records in the database. The next step is to put it together into some sort of real world application like a cron job with e-mail notification, administrative interface or whatever else you prefer.

现在，您知道如何构建目录结构的新配置文件以及如何更新数据库中的记录。下一步是将其组合到某种实际应用程序中，例如具有电子邮件通知，管理界面或您喜欢的其他内容的cron作业。

If you just want to gather a list of files that have changed and you don’t care how they changed, then the simplest approach is to pull the data from the database into an array similar to $files and then use PHP’s array_diff_assoc() function to weed out the riffraff.

如果您只想收集已更改文件的列表，而不关心它们如何更改，那么最简单的方法是将数据库中的数据拉入类似于$files的数组中，然后使用PHP的array_diff_assoc()函数清除子。

<?php
// non-specific check for discrepancies
if (!empty($files)) {
    $result = $db->query("SELECT * FROM integrity_hashes")->fetchAll();
    if (!empty($result)) {
        foreach ($result as $value) {
            $tmp[$value["file_path"]] = $value["file_hash"];
        }
        $diffs = array_diff_assoc($files, $tmp);
        unset($tmp);
    }
}

In this example, $diffs will be populated with any discrepancies found, or it will be an empty array if the file structure is intact. Unlike array_diff(), array_diff_assoc() will use keys in the comparison which is important to us in case of a collision, such as two empty files having the same hash value.

在此示例中， $diffs将填充发现的所有差异，如果文件结构完整，则它将为空数组。与array_diff()不同， array_diff_assoc()将在比较中使用键，这在发生冲突时对我们很重要，例如两个具有相同哈希值的空文件。

If you want to take things a step further, you can throw in some simple logic to determine exactly how a file has been affected, whether it has been deleted, altered or added.

如果您想更进一步，可以引入一些简单的逻辑来确定文件的受影响方式，无论是删除，更改还是添加文件。

<?php
// specific check for discrepancies
if (!empty($files)) {
    $result = $db->query("SELECT * FROM integrity_hashes")->fetchAll();
    if (!empty($result)) {
        $diffs = array();
        $tmp = array();
        foreach ($result as $value) {
            if (!array_key_exists($value["file_path"], $files)) {
                $diffs["del"][$value["file_path"]] = $value["file_hash"];
                $tmp[$value["file_path"]] = $value["file_hash"];
            }
            else {
                if ($files[$value["file_path"]] != $value["file_hash"]) {
                    $diffs["alt"][$value["file_path"]] = $files[$value["file_path"]];
                    $tmp[$value["file_path"]] = $files[$value["file_path"]];
                }
                else {
                    // unchanged
                    $tmp[$value["file_path"]] = $value["file_hash"];
                }
            }
        }
        if (count($tmp) < count($files)) {
            $diffs["add"] = array_diff_assoc($files, $tmp);
        }
        unset($tmp);
    }
}

As we loop through the results from the database, we make several checks. First, array_key_exists() is used to check if the file path from our database is present in $files, and if not then the file must have been deleted. Second, if the file exists but the hash values do not match, the file must have been altered or is otherwise unchanged. We store each check into a temporary array named $tmp, and finally, if there are a greater number of $files than in our database then we know that those leftover un-checked files have been added.

当我们遍历数据库的结果时，我们进行了几次检查。首先，使用array_key_exists()检查数据库中的文件路径是否在$files ，如果不存在，则必须删除该文件。其次，如果文件存在但哈希值不匹配，则该文件必须已更改或未更改。我们将每张支票存储到名为$tmp的临时数组中，最后，如果$files数量大于数据库中的数量，那么我们知道已经添加了那些剩余的未经检查的文件。

When completed, $diffs will either be an empty array or it will contain any discrepancies found in the form of a multi-dimensional array which might appear as follows:

完成后， $diffs将为空数组，或者包含以多维数组形式出现的任何差异，可能会显示如下：

Array
(
    [alt] => Array
        (
            [/var/www/test.php] => eae71874e2277a5bc77176db14ac14bf28465ec3
            [/var/www/sub/hello.php] => a5d5b61aa8a61b7d9d765e1daf971a9a578f1cfa
        )

    [add] => Array
        (
            [/var/www/sub/world.php] => da39a3ee5e6b4b0d3255bfef95601890afd80709
        )

)

To display the results in a more user-friendly format, for an administrative interface or the like, you could for example loop through the results and output them in a bulleted list.

为了以更用户友好的格式显示结果，例如对于管理界面等，您可以循环浏览结果并将其输出在项目符号列表中。

<?php
// display discrepancies
if (!empty($diffs)) {
    echo "<p>The following discrepancies were found:</p>";
    echo "<ul>";
    foreach ($diffs as $status => $affected) {
        if (is_array($affected) && !empty($affected)) {
            echo "<li>" . $status . "</li>";
            echo "<ol>";
            foreach($affected as $path => $hash) {
                echo "<li>" . $path . "</li>";
            }
            echo "</ol>";
        }
    }
    echo "</ul>";
}
else {
    echo "<p>File structure is intact.</p>";
}

At this point you can either provide a link which triggers an action to update the database with the new file structure, in which case you might opt to store $files in a session variable, or if you don’t approve of the discrepancies you can address them however you see fit.

此时，您可以提供一个链接来触发操作，以使用新的文件结构来更新数据库，在这种情况下，您可能选择将$files存储在会话变量中，或者如果您不同意该差异，则可以解决它们，但您认为合适。

摘要 (Summary)

Hopefully this guide has given you a better understanding of monitoring file integrity. Having something like this in place on your website is an invaluable security measure and you can be comfortable knowing that your files remain exactly as you intended. Of course, don’t forget to keep regular backups. You know… just in case.

希望本指南使您对监视文件完整性有更好的了解。在您的网站上放置类似的内容是非常宝贵的安全措施，并且您可以放心知道您的文件完全符合您的预期。当然，不要忘记保留常规备份。你知道...以防万一。

Image via Semisatch / Shutterstock

图片来自Semisatch / Shutterstock

翻译自: https://www.sitepoint.com/monitoring-file-integrity/

用于文件监视的文件名无效:

culi3182

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用于文件监视的文件名无效:_监视文件完整性

用于文件监视的文件名无效:Ask yourself how you might address the following circumstances when managing a website: 问问自己在管理网站时如何解决以下情况： A file is unintentionally added, modified or deleted 意外添加，修改或删除文件 A file is ...
复制链接

扫一扫