一文读懂哈希算法SHA256

简,

已于 2022-05-21 20:23:05 修改

阅读量1.5w

点赞数 21

文章标签：哈希算法散列表数据结构

于 2022-05-21 00:52:02 首次发布

本文链接：https://blog.csdn.net/qq_51473302/article/details/124851177

版权

SHA-2 族算法简介

一个n位的哈希函数就是一个从任意长的消息到n位哈希值的映射，一个n位的加密哈希函数就是一个单向的、避免碰撞的n位哈希函数。这样的函数是目前在数字签名和密码保护当中极为重要的手段。

SHA-2，名称来自于安全散列算法2（英语：Secure Hash Algorithm 2）的缩写，一种密码散列函数算法标准，由美国国家安全局研发，由美国国家标准与技术研究院（NIST）在2001年发布。属于SHA算法之一，是SHA-1的后继者。其下又可再分为六个不同的算法标准，包括了：SHA-224、SHA-256、SHA-384、SHA-512、SHA-512/224、SHA-512/256。这些变体除了生成摘要的长度、循环运行的次数等一些细微差异之外，基本结构是一致的。

总结下来就一句话：对于任意长度的消息，SHA256都会产生一个256bit长的哈希值，称作消息摘要。这个摘要相当于是个长度为32个字节的数组，通常用一个长度为64的十六进制字符串来表示。

SHA256算法核心

预处理：

待哈希的消息在继续哈希计算之前首先要进行以下两个步骤：

对消息进行补位处理，是的最终的长度是512位的倍数，然后
以512位为单位对消息进行分块

SHA256对进入的信息要进行初始化，即使消息满足指定结构。信息处理主要是消息填充和附加长度。（1.通过在消息后增加0，使消息达到指定长度。2.并在最后增加消息长度信息）

（1）在报文末尾进行填充，使报文长度在对512取模以后的余数是448。
填充是这样进行的：先补第一个比特为1，然后都补0，直到长度满足对512取模后余数是448。信息必须进行填充，也就是说，即使长度已经满足对512取模后余数是448，补位也必须要进行，这时要填充512个比特。（即最少补一位，最多补512位）。
（2）这里余数是448的原因是填充后，会再附加上一个64bit的数据，用来表示原始报文的长度信息。而448+64=512，正好拼成了一个完整的结构。附加的64bit的数据，即用一个64位的数据表示原始消息的长度（故消息长度必须小于2^64）。长度信息的编码方式为64-bit big-endian integer（可以简单地理解成，从最高位开始数数字长度）。

得到分组后，进行第一步，也就是扩散

扩散：

对于消息分解成的每个512bit的块，需要构成64个字（每个字节是8位二进制，每个字有4个字节，故每个字有32bit）。前16个字直接由原消息组成，记为W[0] 、…W[15]

剩余的48个字由迭代公式计算而得：

$W_{i} = W_{i-16} + W_{i-7} + S_{0} + S_{1}$

$S_{0} = ROTR_{7}(W_{i-15}) \oplus ROTR_{18}(W_{i-15}) \oplus SHR_{3}(W_{i-15})$

$S_{1} = ROTR_{17}(W_{i-2}) \oplus ROTR_{19}(W_{i-2}) \oplus SHR_{10}(W_{i-2})$

ROTR：即rotate right，循环右移

SHR：即shift right，向右平移，低位补0

$\oplus$ ：即异或，位运算符

注：上述操作符下文不再赘述

得到64组W之后，进行下一步的混淆操作

混淆：

初始哈希值：

取自自然数中前面8个素数(2,3,5,7,11,13,17,19)的平方根的小数部分, 并且取前面的32位。

混淆常量：

取自自然数中前面64个素数的立方根的小数部分的前32位。

现在我们有三组数据，分别是：混淆常量，初始哈希值和64组W。我们用这三组数据进行混淆。

先来介绍四个函数：

$Ch(H_{4},H_{5},H_{6}) = (H_{4} \wedge H_{5}) \oplus ((\sim H_{4})\wedge H_{6})$

$Ma(H_{0},H_{1},H_{2}) = (H_{0} \wedge H_{1}) \oplus (H_{0} \wedge H_{2}) \oplus (H_{1} \wedge H_{2})$

$(\sum )_{0} = ROTR_{2}(H_{0}) \oplus ROTR_{13}(H_{0}) \oplus ROTR_{22}(H_{0})$

$(\sum )_{1} = ROTR_{6}(H_{4}) \oplus ROTR_{11}(H_{4}) \oplus ROTR_{25}(H_{4})$

$\sim$ ：即非，位运算符

$\wedge$ ：即且，位运算符

注（人性化理解）：

Ch函数：H4当前位为1，最终结果取H5的值，如果H4当前位为0，结果就取H6的值

Ma函数：H0，H1，H2，逐位比较，0多取0，1多取1

我们取第一个W和一个混淆常量进行第一次迭代，第一次迭代时，A、B、…H是8个哈希初始值我们设：

$T_{1} = H + (\sum )_{1} + Ch + K_{i} + W_{i}$

$T_{2} = (\sum )_{0} + Ma$

那么进行更新之后的值为：

H = G
G = F
F = E
E = (D + T1)
D = C
C = B
B = A
A = (T1 + T2)

我们有64个W和64个混淆常量，所以需要进行64次迭代，如图

我们把最后一轮A-H的值和初始哈希值相加得到第一轮的哈希值，经过64次迭代，可以得到第一个512块函数的摘要。

即：

$H_{0} = H_{0} + A$

$H_{1} = H_{1} + B$

$H_{2} = H_{2} + C$

$H_{3} = H_{3} + D$

$H_{4} = H_{4} + E$

$H_{5} = H_{5} + F$

$H_{6} = H_{6} + G$

$H_{7} = H_{7} + H$

还没完，这只是第一组512bit的数据，当我们要进行下一组的计算的时候，我们把当前H0-H7的值当作下一组的初始哈希值，即

整体下来则是这样的

最后一次计算的哈希值即为我们最终需要要的哈希值，所以在最后一次合并H0-H7的值即可。

至此，结束我们主要算法部分。

以下是算法实现

SHA256伪代码

Note 1: All variables are 32 bit unsigned integers and addition is calculated modulo 232
Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 63
Note 3: The compression function uses 8 working variables, a through h
Note 4: Big-endian convention is used when expressing the constants in this pseudocode,
    and when parsing message block data from bytes to words, for example,
    the first word of the input message "abc" after padding is 0x61626380

Initialize hash values:
(first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):
h0 := 0x6a09e667
h1 := 0xbb67ae85
h2 := 0x3c6ef372
h3 := 0xa54ff53a
h4 := 0x510e527f
h5 := 0x9b05688c
h6 := 0x1f83d9ab
h7 := 0x5be0cd19

Initialize array of round constants:
(first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):
k[0..63] :=
   0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
   0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
   0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
   0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
   0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
   0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
   0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
   0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2

Pre-processing (Padding):
begin with the original message of length L bits
append a single '1' bit
append K '0' bits, where K is the minimum number >= 0 such that L + 1 + K + 64 is a multiple of 512
append L as a 64-bit big-endian integer, making the total post-processed length a multiple of 512 bits

Process the message in successive 512-bit chunks:
break message into 512-bit chunks
for each chunk
    create a 64-entry message schedule array w[0..63] of 32-bit words
    (The initial values in w[0..63] don't matter, so many implementations zero them here)
    copy chunk into first 16 words w[0..15] of the message schedule array

    Extend the first 16 words into the remaining 48 words w[16..63] of the message schedule array:
    for i from 16 to 63
        s0 := (w[i-15] rightrotate  7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift  3)
        s1 := (w[i- 2] rightrotate 17) xor (w[i- 2] rightrotate 19) xor (w[i- 2] rightshift 10)
        w[i] := w[i-16] + s0 + w[i-7] + s1

    Initialize working variables to current hash value:
    a := h0
    b := h1
    c := h2
    d := h3
    e := h4
    f := h5
    g := h6
    h := h7

    Compression function main loop:
    for i from 0 to 63
        S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
        ch := (e and f) xor ((not e) and g)
        temp1 := h + S1 + ch + k[i] + w[i]
        S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
        maj := (a and b) xor (a and c) xor (b and c)
        temp2 := S0 + maj
 
        h := g
        g := f
        f := e
        e := d + temp1
        d := c
        c := b
        b := a
        a := temp1 + temp2

    Add the compressed chunk to the current hash value:
    h0 := h0 + a
    h1 := h1 + b
    h2 := h2 + c
    h3 := h3 + d
    h4 := h4 + e
    h5 := h5 + f
    h6 := h6 + g
    h7 := h7 + h

Produce the final hash value (big-endian):
digest := hash := h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7

SHA256代码（python实现）

#16进制处理
class SHA256:
    def __init__(self):
        #64个常量
        self.constants = (
            0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
            0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
            0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
            0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
            0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
            0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
            0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
            0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
            0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
            0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
            0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
            0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
            0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
            0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
            0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
            0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2)

        #迭代初始值，h0,h1,...,h7
        self.h = (
            0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
            0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19)

    #x循环右移b个bit
    def rightrotate(self, x, b):
        return ((x >> b) | (x << (32 - b))) & ((2**32)-1)
    
    #信息预处理。附加填充和附加长度值
    def Pad(self, W):
        return bytes(W, "ascii") + b"\x80" + (b"\x00" * ((55 if (len(W) % 64) < 56 else 119) - (len(W) % 64))) + ((len(W) << 3).to_bytes(8, "big"))
    
    #应用SHA256压缩函数来更新A-H的值
    def Compress(self, Wt, Kt, A, B, C, D, E, F, G, H):
        S0 = (self.rightrotate(A, 2) ^ self.rightrotate(A, 13) ^ self.rightrotate(A, 22))
        S1 = (self.rightrotate(E, 6) ^ self.rightrotate(E, 11) ^ self.rightrotate(E, 25))
        Ch = ((E & F) ^ (~E & G))
        Ma = ((A & B) ^ (A & C) ^ (B & C))
        T1 = H + S1 + Ch + Wt + Kt
        T2 = S0 + Ma

        H = G
        G = F
        F = E
        E = (D + T1) & ((2**32)-1)
        D = C
        C = B
        B = A
        A = (T1 + T2) & ((2**32)-1)

        return A,B,C,D,E,F,G,H
    
    #哈希运算
    def hash(self, message):
        message = self.Pad(message)
        digest = list(self.h)
        A, B, C, D, E, F, G, H = digest

        #构建初始的16个W
        for i in range(0, len(message), 64):
            S = message[i: i + 64]
            W = [int.from_bytes(S[e: e + 4], "big") for e in range(0, 64, 4)] + ([0] * 48)

            #构造剩余48个W
            for j in range(16, 64):
                R0 = (self.rightrotate(W[j - 15], 7) ^ self.rightrotate(W[j - 15], 18) ^ (W[j - 15] >> 3))
                R1 = (self.rightrotate(W[j - 2], 17) ^ self.rightrotate(W[j - 2], 19) ^ (W[j - 2] >> 10))
                W[j] = (W[j - 16] + R0 + W[j - 7] + R1) & ((2**32)-1)     

            #不断更新A到H的值
            for j in range(64):
                A, B, C, D, E, F, G, H = self.Compress(W[j], self.constants[j], A, B, C, D, E, F, G, H)

            #合并初始A-H的值和更新后A-H的值
            digest = [(x + y) & ((2**32)-1) for x, y in zip(digest, (A, B, C, D, E, F, G, H))]

            #如果有下一轮：A-H的初始值为上一轮A-H的结果
            A, B, C, D, E, F, G, H = digest

        #把8个占用4字节的变量合并(共32字节，128比特)
        return "".join(format(h, "02x") for h in b"".join(d.to_bytes(4, "big") for d in digest))
        
#主函数
def main():
    encoder = SHA256()
    while True:
        message = input("Enter string: ")
        abstract = encoder.hash(message)
        print(abstract+'\n')

    
if __name__ == "__main__":
    main()