Copyright © 2018 Joyce_BY
All rights reserved.
Contact by Yagnes126@gmail.com
Pre-knowledges
What’s vigenere cipher?
The Vigenère cipher is a method of encrypting alphabetic text by using a series of interwoven Caesar ciphers, based on the letters of a keyword. It is a form of polyalphabetic substitution.
Start
What we have? – A string of nonsense ciphertext
What’s our aim? – Cracking the cipher to get sensible plaintext
What should we do? – Guess key length and then key
Cryptanalysis
How to guess key length?
There are several ways that can achieve our goals.
Kasiski examination
The Kasiski examination takes advantage of the fact that repeated words are, by chance, sometimes encrypted using the same key letters, leading to repeated groups in the ciphertext.
If we follow the Kasiski test to guess the key length, here are the steps we should do:
- find all repeated sequences, record their positions
- calculate the intervals and factors of each sequence.
- guess that some common factors may be the true length of the key.
Friedman test
The Friedman test used the index of coincidence, which measures the unevenness of the cipher letter frequencies to break the cipher.
By knowing the probability (kp) that any two randomly chosen source language letters are the same (around 0.067 for monocase English), and the probability (kr) of a coincidence for a uniform random selection from the alphabet (0.0385 for randomly selected alphabet from English)
The key length can be estimated as the following: (kp-kr)/(ko-kr)
from the observed coincidence rate
ko = sum(i=0 to c) { ni * (ni-1) } / (N * (N-1))
in which c is the size of the alphabet (26 for English), N is the length of the text and n1 to nc are the observed ciphertext letter frequencies, as integers.
Index of Coincidence
A better approach for repeating-key ciphers is to copy the ciphertext into rows of a matrix with as many columns as an assumed key length and then to compute the average index of coincidence with each column considered separately. When that is done for each possible key length, the highest average I.C. then corresponds to the most-likely key length.
How to guess key after having key length?
this is simple.
First, notice that if we split the ciphertext into a matrix by key length, so that every col is encrypted by a same key, a letter in Vigenere case.
Second, it is important to understand the fact that shifting in text is the same as shifting in freqencies of each letter. Therefore, using the following fomula to calculate Mg for every shift, and the one that is most near 0.065 is the desirable shift, then we get the key for that group.
Formula: sum(i=0 to 25)pi*fi;
pi is the origin frequency of each letter calculated by former people; fi is the frequency of each alphabet in that group.
coding
What approach did I use?
- I roughly follow the Friedman test, but used a little trick to simplify the algorithm. What I counted is how many coincidences are there when each time I shift the ciphertext itself. There may be some blank on the start place and some redundant at the tail, just ignore them, what we need is just the middle part to compare with.
- I used the method I described above to figure out the key, which is introduced int the book Cryptography theory and practice.
- After getting the key, we follow the method of encrypting substitution cipher to get the plaintext.
- Dk(ci) = mi - ki (mod 26)
Following is the code
Skeleton:
# !/usr/bin/python
# python 3.7.0
# environment: windows 10
# encode -- UTF-8 --
# authorized by Joyce_BY, all rights reserved.
# contact by email: Yagnes0126@gmail.com
# decryption function:
de