凯撒密码的自动化破解方法(适用于英文文本)

凯撒密码的自动化破解方法(适用于英文文本)

凯撒密码

凯撒加密是有记载的最古老的加密方法。原始的凯撒密码没有密钥,加密方式很原始,就是通过将字母表循环右移三位进行加密:a被D替代,b被E替代,…,x被A替代,y被B替代,z被C替代。
我们现在引入密钥。将字母表看作数字{0,1,2,…,25}(而不是英文字符),密钥k是一个0到25直接的数。加密算法使用密钥k将英文字母组成的明文的每个字母向后移动k个位置,解密算法使用相同的密钥k将密文的每个字母向前移动k个位置。

数学化的描述如下:

  • 明文空间M = 密文空间C = {0,1,2,..,25}
  • 密钥空间K = {0,1,2,…,25}
  • 加密算法Enc: c = (m+k) mod 26
  • 解密算法Dec: m = (c - k) mod 26

下面给出加解密的C代码实现

//加密算法,密钥key 明文文件src_file 生成密文文件des_file
void enc(const char des_file[], const char src_file[], int key)
{
    FILE *fin = fopen(src_file, "r");
    FILE *fout = fopen(des_file, "w");
    char ch, tem;
    while ((ch = fgetc(fin)) != EOF) {
        tem = tolower(ch);
        tem = 'a' + (tem-'a'+key) % 26;
        if (islower(ch)) ch = tem;
        else if (isupper(ch)) ch = toupper(tem);
        fputc(ch, fout);
    }

    fclose(fin);
    fclose(fout);
}
//解密算法,密钥key 密文文件src_file 生成原文件des_file
void dec(const char des_file[], const char src_file[], int key)
{
    FILE *fin = fopen(src_file, "r");
    FILE *fout = fopen(des_file, "w");
    char ch, tem;
    while ((ch = fgetc(fin)) != EOF) {
        tem = tolower(ch);
        tem = 'a' + (tem-'a'+26-key) % 26;
        if (islower(ch)) ch = tem;
        else if (isupper(ch)) ch = toupper(tem);
        fputc(ch, fout);
    }

    fclose(fin);
    fclose(fout);
}

朴素的穷举攻击方法

由于凯撒密码的密钥量太小了,只有26个可能的密钥,因而非常容易尝试每个密钥,并观察哪个密钥解密密文后得到的明文“有意义”。这种方法的缺点就是很难自动进行,因为对于计算机而言查看明文是否“有意义”比较困难(也并非不可能,比如通过查看包含有效英文单词的字典来完成)。有些情况下,明文字符符合英文文本的分布规律,但是明文本身不是有效的英文文本,这使得问题变得更加困难。

结合频度分析的自动化攻击

凯撒密码中每个字符的映射是固定的,因此如果字母a映射到D,那么每次a在明文中出现的时候,都会导致字母D在密文中出现。英语中单个字母的概率分布是已知的,不同的字母在不同文本中的平均出现频率通常是一样的,文本越长,频率计算就越接近平均值。但是,即使是相对较短的文本(仅有几十个字)已经足够接近平均值的分布了。

下面给出频率分布表(存放在double数组中)

#define LIST_LEN 26

const double p[LIST_LEN] = {0.082,0.015,0.028,0.042,0.127,
                      0.022,0.02, 0.061,0.07, 0.001,
                      0.008,0.04, 0.024,0.067,0.075,
                      0.019,0.001,0.06, 0.063,0.09,
                      0.028,0.01, 0.024,0.02, 0.001,0.001};

用0到25的数字表示英文字母。令pi(0<= i < 26)表示普通英文文本中字母i出现的概率,对已知的值pi容易计算

i=025p2i0.065379
<script type="math/tex; mode=display" id="MathJax-Element-4">\sum_{i=0}^{25} p_i^2 \approx 0.065379 </script>

现在给定一些密文,并令qi表示第i个字母在密文中的频率(qi是第i个字符出现的次数除以密文的长度)。如果密钥是k,那么期望对于每个i,qi+k约等于pi(这里用i+k代替(i+k)mod26)
对于每个j从0到25

Ij=i=025piqi+j0.065379
<script type="math/tex; mode=display" id="MathJax-Element-2">I_j = \sum_{i=0}^{25} p_iq_{i+j} \approx 0.065379 </script>

如果发现Ik约等于0.065379,这里k就是密钥。这样,密钥恢复攻击非常容易自动进行:对于所有j计算Ij,并输出所有Ik接近0.065379的k。

攻击代码如下:

#define TARGET 0.065379  
#define LIST_LEN 26

const double p[LIST_LEN] = {0.082,0.015,0.028,0.042,0.127,
                      0.022,0.02, 0.061,0.07, 0.001,
                      0.008,0.04, 0.024,0.067,0.075,
                      0.019,0.001,0.06, 0.063,0.09,
                      0.028,0.01, 0.024,0.02, 0.001,0.001};

/*统计filename中字母出现频率,存放在数组q中*/
void count(const char filename[], double q[LIST_LEN])
{
    int i, len = 0;   /* len 为字母总个数约等于密文长度*/
    FILE *fin = fopen(filename, "r");
    char ch;
    for (i = 0; i < LIST_LEN; i++) q[i] = 0;
    while ((ch = fgetc(fin)) != EOF) {
        if (isalpha(ch)) {
            len++;
            ch = tolower(ch);
            q[ch-'a'] += 1;
        }
    }
    fclose(fin);
    for (i = 0; i < LIST_LEN; i++) q[i] /= len;
}
/*破解密钥key,其中数组q存放密文字母出现频率*/
int analysis(const char filename[], double q[LIST_LEN])
{
    int i, j, key = 0;
    double eps = 1;  /* eps 存储与TARGET最小差值*/
    count(filename, q);
    for (j = 0; j < LIST_LEN; j++) { /*变量j穷举密钥*/
        double sum = 0, tem;
        for (i = 0; i < LIST_LEN; i++) {
            /* 求sum{p[i]*q[i+j]}之和 */
            int t = (i+j) % 26;
            sum += p[i] * q[t];
        }
        tem = fabs(sum-TARGET);
        if (tem < eps) {
            eps = tem; key = j;
        }
    }
    return key;
}

演示

使用密钥k=14,加密如下英文文本:Q is a symmmetric block cipher. It is defined for a block size of 128 bits. It allows arbitrary length passwords. The design is fairly conservative. It consists of a simple substitution-permutation network. In this paper we present the cipher, its design criteria and our analysis. The design is based on both Rjindael and Serpent. It uses an 8-bit s-box from Rijndael with the linear mixing layers replaced with two Serpent style bit-slice s-boxs and a linear permutation. The combination of methods eliminates the high level strcuture inherent in Rjindael while having better speed and avalanche characteristics than Serpent. Speed is improved over Serpent. This version 2.00 contains better analysis, editorial changes, and an improved key scheduling algorithm. The number of recommended rounds is also increased.

密文为:E wg o gmaaashfwq pzcqy qwdvsf. Wh wg rstwbsr tcf o pzcqy gwns ct 128 pwhg. Who zzckg ofpwhfofm zsbuhv doggkcfrg. Hvs rsgwub wg towfzm qcbgsfjohwjs. Wh qcbgwghg ct o gwadzs gipghwhihwcb-dsfaihohwcb bshkcfy. Wb hvwg dodsf ks dfsgsbh hvs qwdvsf, whg rsgwub qfwhsfwo obr cif obozmgwg. Hvs rsgwub wg pogsr cb pchv Fxwbrosz obr Gsfdsbh. Wh igsg ob 8-pwh g-pcl tfca Fwxbrosz kwhv hvs zwbsof awlwbu zomsfg fsdzoqsr kwhv hkc Gsfdsbh ghmzs pwh-gzwqs g-pclg obr o zwbsof dsfaihohwcb. Hvs qcapwbohwcb ct ashvcrg szwawbohsg hvs vwuv zsjsz ghfqihifs wbvsfsbh wb Fxwbrosz kvwzs vojwbu pshhsf gdssr obr ojozobqvs qvofoqhsfwghwqg hvob Gsfdsbh. Gdssr wg wadfcjsr cjsf Gsfdsbh. Hvwg jsfgwcb 2.00 qcbhowbg pshhsf obozmgwg, srwhcfwoz qvobusg, obr ob wadfcjsr ysm gqvsrizwbu ozucfwhva. Hvs biapsf ct fsqcaasbrsr fcibrg wg ozgc wbqfsogsr.

穷举密钥,频率平方和结果如下表:

keyI
00.037009
10.042199
20.038766
30.045997
40.037519
50.036376
60.031870
70.035365
80.035437
90.033578
100.043584
110.035186
120.032596
130.040507
140.065044
150.039813
160.028876
170.032173
180.049036
190.034005
200.032851
210.037881
220.035855
230.034066
240.037038
250.046374

从结果来看,只有key为14时,频率平方和最接近0.065,其他数值都相差太远,因此密钥就是14。
由此可见,对于用凯撒密码加密的英文文本来说,只要拥有密文信息,就可以轻松找到密钥,归根结底还是凯撒密码的密钥空间太小了。

单表代替密码

凯撒密码的密钥量太小了,而且每个明文字母映射到密文字母都是相同的移位,只要明文泄露一个字母字符,凯撒密码就玩完了。
单表代替密码的思想就是对字母表进行置换,这样每个明文字符以“任意方式”映射到密文字符(一一映射),密钥空间大小26! (大约为288)。

实际上,就算密钥空间大了许多,单表代替密码还是很快就会被破解,根据频率计数就可以初步猜测映射关系,比如e是英语中使用最频繁的字母,可以 猜测密文中最频繁的字母对应的明文就是e。

总而言之,虽然单表代替的密钥空间很大,但它仍然不安全。

©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页