密码散列函数_密码散列的风险和挑战

最新推荐文章于 2024-09-16 08:00:48 发布

culi4814

最新推荐文章于 2024-09-16 08:00:48 发布

阅读量1.5k

点赞数

文章标签：算法数据库大数据 python 机器学习

原文链接：https://www.sitepoint.com/risks-challenges-password-hashing/

版权

本文探讨了密码散列在存储用户凭据中的安全性问题。简单哈希和加盐哈希存在被破解的风险，特别是面对日益增强的计算能力。密码扩展如bcrypt和scrypt能提高安全性，但仍有局限。加密技术的应用需谨慎，避免数据泄露。最佳实践是使用如bcrypt等已验证的解决方案。

摘要由CSDN通过智能技术生成

密码散列函数

In a past article, password hashing was discussed as a way to securely store user credentials in an application. Security is always a very controversial topic, much alike politics and religion, where many points of view exist and a ‘perfect solution’ for someone is not the same to others. In my opinion, breaking an application’s security measures is just a matter of time. With computer power and complexity increasing every day, today’s secure applications will not be so secure tomorrow.

在上一篇文章中，讨论了密码哈希作为一种在应用程序中安全存储用户凭据的方法。安全始终是一个备受争议的话题，与政治和宗教相似，存在许多观点，为某人提供“完美解决方案”与其他人并不相同。我认为，破坏应用程序的安全措施只是时间问题。随着计算机功能和复杂性每天都在增加，今天的安全应用程序明天将变得不那么安全。

For our readers who are not familiar with what a hash algorithm is, it’s nothing more than a one way function that maps data of variable length to data of fixed length. So if we analyze the above definition we need to understand the following requirements and characteristics of such algorithms:

对于不熟悉哈希算法是什么的读者，它不过是将可变长度数据映射到固定长度数据的 单向函数 。因此，如果我们分析以上定义，则需要了解此类算法的以下要求和特征：

One way function: the output cannot be reversed using an efficient algorithm.
单向功能 ：使用高效算法无法反转输出。
Maps data of variable length to data of fixed length: meaning that the input message space can be “infinite”, but the output space is not. This has the implication that 2 or more input messages can have the same hash. The smaller the output space, the greater the probability of a ‘collision’ between two input messages.
将可变长度的数据映射到固定长度的数据 ：这意味着输入消息空间可以是“无限”，而输出空间则不能。这意味着两个或更多输入消息可以具有相同的哈希。输出空间越小，两条输入消息之间发生“冲突”的可能性就越大。

md5 has confirmed practical collisions and sha1’s probabilities for reaching a collision are growing every day (more info in collision probability can be found by analyzing the classic Birthday Problem), so if we need to apply a hashing algorithm, we should use the ones that have greater output space (and a negligible collision probability), such as sha256, sha512, whirlpool, etc…

md5已经确认了实际的碰撞，并且sha1达到碰撞的概率每天都在增长(有关碰撞概率的更多信息可以通过分析经典的Birthday Problem来找到)，因此，如果我们需要应用哈希算法，则应该使用具有更大的输出空间(并且碰撞概率可以忽略不计)，例如sha256，sha512 ，漩涡等。

They are also called ‘Pseudo-random functions’, meaning that the output of a hashing function should be indistinguishable from a true random number generator (or TRNG).

它们也称为“ 伪随机函数” ，这意味着散列函数的输出应与真正的随机数生成器 (或TRNG)没有区别。

为什么简单的哈希存储密码不安全 (Why simple hashing is insecure for storing passwords)

The fact that the output of a hash function cannot be reverted back to the input using an efficient algorithm does not mean that it cannot be cracked. Databases containing hashes of common words and short strings are usually within our reach with a simple google search. Also, common strings can be easily and quickly brute-forced or cracked with a dictionary attack.

哈希函数的输出无法使用高效算法还原为输入这一事实，并不意味着无法破解它。包含常见单词和短字符串哈希的数据库通常都可以通过简单的Google搜索来实现。同样，常见的字符串可以通过字典攻击轻松快速地被强行破解或破解。

示范 (Demonstration)

Here is a quick video on how a tool like sqlmap can crack passwords via sql injection by bruteforcing md5 hashes in a database.

这是有关sqlmap之类的工具如何通过暴力破解数据库中的md5散列来通过sql注入破解密码的快速视频。

Also, we could have just done the simplest of attacks… just grab the hash and google it… Chances are that the hash exists in an online database. Examples of hash databases are:

另外，我们本来可以做最简单的攻击……只需要抓取哈希并将其谷歌搜索即可 ……可能哈希存在于在线数据库中。哈希数据库的示例包括：

We also have to consider that since 2 or more identical passwords will indeed have the same hash value, cracking one hash will automatically give you the passwords of every single user that used the same. Just to be clear, say you have thousands of users, it is very likely that a fair amount of them will use (if passwords policies are not enforced) the infamous ‘123456’ password. The md5 hash value of that password is ‘e10adc3949ba59abbe56e057f20f883e’, so when you crack this hash (if you even have to) and search for all the users who have this value in their password field, you will know that every single one of them used the ‘123456’ password.

我们还必须考虑到，由于2个或更多相同的密码确实具有相同的哈希值，因此破解一个哈希将自动为您提供使用相同密码的每个用户的密码。明确地说，假设您有成千上万的用户，很可能其中很大一部分将使用臭名昭著的“ 123456”密码(如果未实施密码策略)。该密码的md5哈希值是'e10adc3949ba59abbe56e057f20f883e'，因此，当您破解此哈希(如果需要的话)并在其密码字段中搜索所有具有此值的用户时，您将知道他们每个人都使用过“ 123456”密码。

为什么加盐的哈希值对于存储密码不安全 (Why salted hashes are insecure for storing passwords)

To mitigate this attack, salts became common but obviously are not enough for today’s computing power, especially if the salt string is short, which makes it brute-forceable.

为了减轻这种攻击，盐变得很普遍，但显然不足以提供当今的计算能力，尤其是如果盐串较短，这会使它成为蛮力的。

The basic password/salt function is defined as:

基本的密码/盐功能定义为：

f(password, salt) = hash(password + salt)

In order to mitigate a brute-force attack, a salt should be as long as 64 characters, however, in order to authenticate a user later on, the salt must be stored in plain text inside the database, so:

为了减轻暴力攻击，salt的长度应为64个字符，但是，为了以后以后对用户进行身份验证，salt必须以纯文本格式存储在数据库中，因此：

if (hash([provided password] + [stored salt]) == [stored hash]) then user is authenticated

Since every user will have a completely different salt, this also avoids the problem with simple hashes, where we could easily tell if 2 or more users are using the same password; now the hashes will be different. We can also no longer take the password hash directly and try to google it. Also, with a long salt, a brute-force attack is improbable. But, if an attacker gets access to this salt either by an sql injection attack or direct access to the database, a brute-force or dictionary attack becomes probable, especially if your users use common passwords (again, like ‘123456’):

由于每个用户使用的密码完全不同，因此也避免了简单哈希的问题，我们可以轻松判断两个或两个以上的用户是否使用相同的密码；现在，哈希值将有所不同。我们也无法再直接获取密码哈希并尝试使用Google进行搜索。而且，如果盐分过长，就不可能进行暴力攻击。但是，如果攻击者通过sql注入攻击或直接访问数据库来访问此盐，则可能会发生暴力破解或字典攻击，尤其是在您的用户使用通用密码的情况下(同样，例如“ 123456”)：

Generate some string or get entry from dictionary
Concatenate with salt
Apply hash algorithm
If generated hash == hash in database then Bingo
else continue iterating

But even if one password gets cracked, that will not automatically give you the password for every user who might have used it, since no user should have the same stored hash.

但是，即使破解了一个密码，也不会自动为每个可能使用过该密码的用户提供密码，因为任何用户都不应拥有相同的存储哈希值。

随机性问题 (The randomness issue)

In order to generate a good salt, we should have a good random number generator. If php’s rand() function automatically popped up in your mind, forget it immediately.

为了产生好的盐，我们应该有一个好的随机数发生器。如果您脑海中突然弹出php的rand()函数，请立即忘记它。

There is an excellent article about randomness in random.org. Simply put, a computer can’t think of random data by itself. Computers are said to be deterministic machines, meaning that every single algorithm a computer is able to run, given the exact same input, will always produce the same output.

random.org中有一篇关于随机性的出色文章。简而言之，计算机本身无法考虑随机数据。据说计算机是确定性机器，这意味着，只要输入完全相同，计算机能够运行的每个算法都将始终产生相同的输出。

When a random number is requested to the computer, it typically gets inputs from several sources, like environment variables (date, time, # of bytes read/written, uptime…), then apply some calculations on them to produce random data. This is the reason why random data given by an algorithm is called pseudo random and thus it is important to differentiate from a true random data source. If we are somehow able to recreate the exact conditions present at the moment of the execution of a pseudo-random number generator (or PRNG), we will automatically have the original generated number.

当向计算机请求一个随机数时，它通常从多个来源获取输入，例如环境变量(日期，时间，读取/写入的字节数，正常运行时间...)，然后对它们进行一些计算以生成随机数据。这就是为什么将算法提供的随机数据称为伪随机的原因 ，因此与真实的随机数据源区分开很重要。如果我们能够以某种方式重新创建在执行伪随机数生成器(或PRNG)时出现的确切条件，我们将自动获得原始的生成数。

Additionally, if a PRNG is not properly implemented, it is possible to discover patterns in the generated data. If patterns exist, we can predict the outcome… Take for instance the case of PHP’s rand() function on Windows as documented here. While it is not clear which PHP or Windows version is used, you can immediately tell there is something wrong by looking at the bitmap generated by using rand():

另外，如果未正确实现PRNG，则有可能发现所生成数据中的模式。如果存在模式，我们可以预测结果……例如，此处记录了 Windows上PHP的rand()函数的情况。虽然不清楚使用哪个PHP或Windows版本，但是通过查看使用rand()生成的位图，您可以立即发现问题所在：

Compare to the output image from a TRNG:

与TRNG的输出图像进行比较：

Even though the issue has been addressed on PHP >= 5, rand() and even mt_rand() are still considered highly inadequate for security related purposes.

即使已在PHP> = 5上解决了该问题，对于安全性相关的目的，仍然认为rand()甚至mt_rand()仍然不足。

If you need to generate random data, please use openssl_random_pseudo_bytes() available as of PHP 5 >= 5.3.0, it even has the crypto_strong flag that will tell you if the bytes are secure enough.

如果需要生成随机数据，请使用从PHP 5> = 5.3.0开始可用的openssl_random_pseudo_bytes() ，它甚至具有crypto_strong标志，该标志将告诉您字节是否足够安全。

Here is a quick code sample to generate random strings using openssl_random_pseudo_bytes()

这是一个使用openssl_random_pseudo_bytes()生成随机字符串的快速代码示例

<?php

function getRandomBytes ($byteLength)
{
    /*
     * Checks if openssl_random_pseudo_bytes is available 
     */
    if (function_exists('openssl_random_pseudo_bytes')) {
        $randomBytes = openssl_random_pseudo_bytes($byteLength, $cryptoStrong);
        if ($cryptoStrong)
            return $randomBytes;
    } 

    /*
     * if openssl_random_pseudo_bytes is not available or its result is not
     * strong, fallback to a less secure RNG
     */
    $hash = '';
    $randomBytes = '';

    /*
     * On linux/unix systems, /dev/urandom is an excellent entropy source, use
     * it to seed initial value of $hash
     */
    if (file_exists('/dev/urandom')) {
        $fp = fopen('/dev/urandom', 'rb');
        if ($fp) {
            if (function_exists('stream_set_read_buffer')) {
                stream_set_read_buffer($fp, 0);
            }
            $hash = fread($fp, $byteLength);
            fclose($fp);
        }
    }

    /*
     * Use the less secure mt_rand() function, but never rand()!
     */
    for ($i = 0; $i < $byteLength; $i ++) {
        $hash = hash('sha256', $hash . mt_rand());
        $char = mt_rand(0, 62);
        $randomBytes .= chr(hexdec($hash[$char] . $hash[$char + 1]));
    }
    return $randomBytes;
}

如果操作正确，密码扩展会很有效 (Password stretching can be effective if done right)

To further mitigate brute-force attacks, we can implement the password stretching technique. This is just an iterative or recursive algorithm that calculates a hash value over and over in itself, usually tens of thousands of times (or more).

为了进一步缓解暴力攻击，我们可以实施密码扩展技术。这只是一种迭代或递归算法，它本身会一遍又一遍地计算哈希值，通常是数万次(或更多)。

This algorithm should iterate enough in order to perform all calculations in at least 1 second (slower hashing also means the attacker will have to wait).

此算法应足够迭代，以便至少在1秒内执行所有计算(较慢的哈希还意味着攻击者必须等待)。

In order to crack a password secured by stretching, the attacker should:

为了破解通过扩展保护的密码，攻击者应：

Know the exact iteration count, any deviation will produce entirely different hashes.
知道确切的迭代计数，任何偏差都会产生完全不同的哈希。
Should wait at least 1 second between each attempt.
每次尝试之间应至少等待1秒。

This makes an attack improbable… but not impossible. In order to overcome the 1 second delay, an attacker should have higher hardware specs than the computer for which the algorithm was tuned, something that might mean a high cost, so the attack becomes prohibitively expensive.

这使得攻击不太可能……但并非不可能。为了克服1秒的延迟，攻击者应具有比为其进行算法优化的计算机更高的硬件规格，这可能意味着高昂的代价，因此攻击变得非常昂贵。

You can also use standard algorithms, like PBKDF2 which is a Password Based Key Derivation Function

您还可以使用标准算法，例如PBKDF2 ，它是基于密码的密钥派生功能。

<?php

/*
 * PHP PBKDF2 implementation The number of rounds can be increased to keep ahead
 * of improvements in CPU/GPU performance. You should use a different salt for
 * each password (it's safe to store it alongside your generated password This
 * function is slow; that's intentional! For more information see: -
 * http://en.wikipedia.org/wiki/PBKDF2 - http://www.ietf.org/rfc/rfc2898.txt
 */
function pbkdf2 ($password, $salt, $rounds = 15000, $keyLength = 32, 
        $hashAlgorithm = 'sha256', $start = 0)
{
    // Key blocks to compute
    $keyBlocks = $start + $keyLength;

    // Derived key
    $derivedKey = '';

    // Create key
    for ($block = 1; $block <= $keyBlocks; $block ++) {
        // Initial hash for this block
        $iteratedBlock = $hash = hash_hmac($hashAlgorithm, 
                $salt . pack('N', $block), $password, true);

        // Perform block iterations
        for ($i = 1; $i < $rounds; $i ++) {
            // XOR each iteration
            $iteratedBlock ^= ($hash = hash_hmac($hashAlgorithm, $hash, 
                    $password, true));
        }

        // Append iterated block
        $derivedKey .= $iteratedBlock;
    }

    // Return derived key of correct length
    return base64_encode(substr($derivedKey, $start, $keyLength));
}

There are also time and memory intensive algorithms like bcrypt (via the crypt() function) and scrypt

还有一些时间和内存密集型算法，例如bcrypt(通过crypt()函数)和scrypt

<?php
//bcrypt is implemented in php's crypt() function
$hash = crypt($pasword, '$2a$' . $cost . '$' . $salt);

Where $cost is the work factor, and $salt is a random string you can generate using the secure_rand() function above.

其中$cost是工作因素，而$salt是可以使用上面的secure_rand()函数生成的随机字符串。

The workload factor is totally dependent on the target system. You can start with a factor of ‘09’ and increase it until the operation completes in approx. 1 second.

工作负载因子完全取决于目标系统。您可以从系数“ 09”开始，然后增加该系数直到操作完成约20分钟。 1秒。

As of PHP 5 >= 5.5.0 you can use the new password_hash() function, which uses bcrypt as the default method for hashing.

从PHP 5> = 5.5.0开始，您可以使用新的password_hash()函数，该函数使用bcrypt作为默认的哈希方法。

There is no scrypt support in PHP yet, but you can check out Domblack’s scrypt implementation.

PHP中尚不支持scrypt，但您可以查看Domblack的scrypt实现。

如何应用加密技术？ (What about applying encryption techniques?)

Hashing and ciphering (or encrypting) are terms which are often confused. As I mentioned before, hashing is a pseudo-random function, while cyphering is generally a ‘pseudo-random permutation’. This means the input message is sliced and changed in such a way that the output is indistinguishable from a TRNG, however the output CAN be transformed back again into the original input. This transformation is done using an encryption key, without which it should be impossible to transform the output into the original message again.

哈希和加密 (或加密 )是经常混淆的术语。如前所述，散列是伪随机函数，而加密通常是“伪随机排列” 。这意味着将对输入消息进行切片和更改，以使输出与TRNG不能区分，但是可以将输出再次转换回原始输入。这种转换是使用加密密钥完成的，没有它，就不可能再次将输出转换成原始消息。

Ciphering has another big difference compared to hashing. While the output message space of hashing is finite, ciphering output message space is infinite, as the relationship between the input and output is 1:1, thus collisions should not exist.

与散列相比，加密还有另一个很大的不同。尽管散列的输出消息空间是有限的，但是加密输出消息空间是无限的，因为输入和输出之间的关系是1：1，因此不应存在冲突。

One has to be very careful on how to correctly apply encryption techniques, thinking that just by applying an encryption algorithm to sensitive data is enough to keep it safe is considered wrong, as many problems exist that could lead to data leaks. As a general rule, you should never consider applying your own encryption implementation

必须对如何正确应用加密技术非常谨慎，认为仅对敏感数据应用加密算法就足以确保其安全被认为是错误的，因为存在许多可能导致数据泄漏的问题。通常，您永远不要考虑应用自己的加密实现

Recently, Adobe had a massive data leak of their users database, since they incorrectly applied encryption techniques and I’ll take them as an example of what not to do. I’ll try to be as straight forward as possible, keeping things really simple.

最近，由于用户错误地使用了加密技术，因此Adobe在其用户数据库中发生了大规模的数据泄漏，我将以它们为例说明不采取的措施。我会尽量保持直截了当，使事情变得非常简单。

Consider the following schema:

考虑以下架构：

Let’s say the plain text contents of the table are as follows:

假设表的纯文本内容如下：

Now, someone at Adobe decided to cipher the passwords, but made two big mistakes:

现在，Adobe的某人决定对密码进行加密，但犯了两个大错误：

Used the same cipher key to encrypt the passwords
使用相同的密钥对密码进行加密
Decided to leave the password hint field in plain text
决定将密码提示字段保留为纯文本

Let’s say for example, that after applying an enciphering algorithm to the password field, now our data looks like the following:

例如，假设在对密码字段应用加密算法之后，现在我们的数据如下所示：

While the passwords are not able to be simply decrypted, and we cannot know the encryption key used in a simple way, by examining the data we can notice that records 2 and 7 share the same password, as well 3 and 6… This is where the password hint field comes into play.

虽然不能简单地解密密码，也无法以简单的方式知道加密密钥，但是通过检查数据，我们可以发现记录2和7共享相同的密码，而记录3和6共享相同的密码。密码提示字段起作用。

Record 6 hint is “I’m one!” which does not give us much information, however record 3’s hint does… we can safely guess that the password is “queen”. Records 2 and 7 hints do not give a lot of information alone, but if we look at them together, how many holidays have the same name as a scary movie? Now we have access to everyone’s account that used “halloween” as a password.

记录6提示“我是一个！” 它并没有给我们太多信息，但是记录3的提示却可以……我们可以安全地猜测密码为“ queen” 。记录2和7的提示不能单独提供很多信息，但是如果我们一起看一下，有多少个假期与这部恐怖电影同名？现在，我们可以访问使用“万圣节”作为密码的每个人的帐户。

To mitigate the risk of data leaks, it is better to switch to hashing techniques, however if you must use encryption techniques to store passwords, we can use tweakable encryption. The term looks fancy, but is very simple.

为了减轻数据泄漏的风险，最好改用哈希技术，但是，如果必须使用加密技术来存储密码，则可以使用可调整的加密 。这个词看起来很漂亮，但是非常简单。

Let’s say we have thousands of users, and we want to encrypt all passwords. As we saw, we cannot use the same encryption key for every password since the data will be at risks (and other sophisticated attacks become possible). However, we cannot use a unique key for every user, since storing those keys will become a security risk by itself. What we have to do is to generate a single key and use a ‘tweak’ that will be unique to every user, and both the key and the tweak together will be the encryption key for each record. The simplest of tweaks available is the primary key, which by definition is unique to every record in the table (although I do not recommend to use it, this is just for demonstrating the concept):

假设我们有成千上万的用户，并且我们希望对所有密码进行加密。如我们所见，我们不能为每个密码使用相同的加密密钥，因为数据将面临风险(并且其他复杂的攻击也有可能)。但是，我们不能为每个用户使用唯一的密钥，因为存储这些密钥本身将成为安全隐患。我们要做的是生成一个密钥，并使用一个对每个用户都是唯一的“调整” ，并且密钥和调整一起将成为每个记录的加密密钥。最简单的调整是主键，根据定义，该键对于表中的每条记录都是唯一的(尽管我不建议使用它，但这仅是为了演示概念)：

f(key, primaryKey) = key + primaryKey

Above I’m simply concatenating both the encryption key and the primary key’s value to generate the final encryption key, however you can (and should) apply a hashing algorithm or a key derivation function to them. Also, instead of using the primary key as the tweak, you might want to generate a nonce (similar to a salt) for each record to be used as the tweak.

上面我只是简单地将加密密钥和主密钥的值连接起来以生成最终的加密密钥，但是您可以( 并且应该 )对它们应用哈希算法或密钥派生函数。另外，您可能不想为每个记录生成一个随机数 (类似于盐)来用作调整，而不是使用主键进行调整。

After applying tweakable encryption to the users table, it now looks like the following:

对用户表应用可调整的加密后，现在看起来如下所示：

Of course we still have the password hint problem, but now each record has a unique value, so it is not apparent which users are using the same password.

当然我们仍然存在密码提示问题，但是现在每个记录都有一个唯一的值，因此不清楚哪个用户使用相同的密码。

I want to emphasize that encryption is not the best solution, and should be avoided if possible to store passwords since a lot of weaknesses can be injected… You can and should stick to proven solutions (like bcrypt) to store passwords, but keep in mind that even proven solutions have their weaknesses.

我想强调一下，加密不是最好的解决方案，并且应该尽可能避免使用密码来存储密码，因为可能会注入很多弱点……您可以并且应该坚持使用行之有效的解决方案(例如bcrypt)来存储密码，但是请记住即使是经过验证的解决方案也有其缺点。

结论 (Conclusion)

There is no perfect solution and the risk of someone breaking our security measures grows every day. However, cryptographic and data security studies and research continue, with the relative recent definition of sponge functions, our toolkit keeps growing everyday.

没有完美的解决方案，有人破坏我们的安全措施的风险每天都在增加。但是，随着对海绵功能的最新定义，密码学和数据安全性研究仍在继续，我们的工具包每天都在增长。