Vigenere Cipher Crack Report
Vigenere Cipher System
Plaintext
P
=
(
p
1
,
p
2
,
⋯
,
p
n
)
P = (p_1, p_2, \cdots, p_n)
P=(p1,p2,⋯,pn)
Ciphertext
C
=
(
c
1
,
c
2
,
⋯
,
c
n
)
C = (c_1, c_2, \cdots, c_n)
C=(c1,c2,⋯,cn)
Key
K
=
(
k
1
,
k
2
,
⋯
,
k
n
)
K = (k_1,k_2, \cdots, k_n)
K=(k1,k2,⋯,kn)
Encryption:
c
i
=
E
k
,
i
(
p
i
)
=
(
p
i
+
k
i
)
m
o
d
26
c_i = E_{k, i}(p_i) = (p_i + k_i) \mod 26
ci=Ek,i(pi)=(pi+ki)mod26
Decryption:
p
i
=
D
k
,
i
(
c
i
)
=
(
c
i
−
k
i
)
m
o
d
26
p_i = D_{k, i}(c_i) = (c_i - k_i) \mod 26
pi=Dk,i(ci)=(ci−ki)mod26
Crack steps
The Vigenere Cipher is a special form of the Caesar Cipher, so it can be cracked using word frequency analysis.
FREQ = [0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074]
Determine key length
Let’s assume a language consist of n n n letters, for example english consists of 26 letters. The probability of each letter occurring is P i P_i Pi, then the coincidence index is the sum of the probabilities that two random elements are the same, denoted as CI = ∑ i = 1 n P i 2 \text{CI}=\sum_{i=1}^{n}{P_{i}^{2}} CI=∑i=1nPi2. For example, totally random English text’s CI = ∑ i = 1 26 1 26 2 = 0.0385 \text{CI} = \sum_{i=1}^{26}{{\frac{1}{26}}^{2}}=0.0385 CI=∑i=1262612=0.0385. However, according to the frequency table above, a meaningful English text’s CI = 0.065 \text{CI}=0.065 CI=0.065. So, for the cipher text of length L L L, CI = ∑ i = 1 26 f i L ⋅ f i − 1 L − 1 \text{CI}=\sum_{i=1}^{26}{\frac{f_i}{L}·\frac{f_i-1}{L-1}} CI=∑i=126Lfi⋅L−1fi−1, where f i f_i fi is the number of occurrences of the i-th letter.
def CoincidenceIndex(cipherText):
N = len(cipherText)
fsum = 0
for letter in ALPHABET:
num = cipherText.count(letter)
fsum += num*(num-1)
CI = fsum/(N*(N-1))
return CI
We use the Coincidence Index method to determine the length. If the key has length of K K K, then we can split the cipher text into K K K parts, and compute the CI \text{CI} CI respectively, which will generate K ′ s K's K′s CI \text{CI} CI. We choose the key whose mean of CI \text{CI} CIs closest to 0.065. However, sometimes, when the key length is K K K, the key of length 2 K 2K 2K may also have the similar result, so we choose the top 2 closest to 0.065, if they’re divisible, choose the smaller one as the key length. In fact, if we only want to decrypt the encrypted ciphertext, there is no different whether the key length is K K K, 2 K 2K 2K, or n K nK nK.
def inferenceLength(cipherText):
CIs = np.zeros(MAXLENGTH, dtype=np.float32)
for l in range(1, MAXLENGTH+1):
CI = 0
for i in range(l):
CI += CoincidenceIndex(cipherText[i::l])
CIs[l-1] = CI/l
indices = np.argsort(np.fabs(0.065 - CIs), axis=0)+1
top_1, top_2 = indices[:2]
if (top_1%top_2==0):
return top_2
return top_1
Determine key
Now with the length of key, we can crack the specific key contents with CI method. For the key of length
K
K
K, we can split the cipher text to
K
K
K parts, and each part is a simple Caesar Cipher. We only need to determine the shift one by one part. For a single Caesar Cipher, we can use
CI
=
∑
i
=
1
26
f
i
L
F
i
\text{CI} = \sum_{i=1}^{26}{\frac{f_i}{L}F_i}
CI=∑i=126LfiFi to determine shift, where
F
i
F_i
Fi is the i-th letter’s frequency of FREQ
table above.
def CIfreq(sequence, shift=0):
CI = 0
freq = [*FREQ[-shift:], *FREQ[:-shift]]
for letter in ALPHABET:
num = sequence.count(letter)
CI += num*freq[ord(letter)-ord('a')]
return CI
For each cipher text part, we sequential cyclic shift the FREQ table and record the corresponding CI \text{CI} CI. We choose the largest one’s shift as offset or key. Strictly speaking, it should be the one closest to 0.065, however, the value will not exceed 0.065 theoretically.
def inferenceKey(cipherText):
length = inferenceLength(cipherText)
key = ""
for i in range(length):
CIs = np.zeros(26, dtype=np.float32)
for _ in range(26):
CIs[_] = CIfreq(cipherText[i::length], _)
key += chr(ord("a")+CIs.argmax())
return key
Now, we get the KEY, which means we have cracked the Vigenere Cipher System. Then simply apply decryption operation to get plain text.
keyLen = len(key)
result = []
i = 0
for c in cipherText:
if "a"<=c<="z":
p = (ord(c) - ord(key[i%keyLen]) + 26)%26 + ord("a")
result.append(chr(p))
i += 1
else:
result.append(c)