# Hide and seek (part 2)

In a previous post I wrote about simple substitution ciphers. A cryptanalyst can break the code using frequency analysis and checking language specific patterns (such as articles or common sequences).

The idea to keep the cryptanalyst confused is to change the encoding for each character. In that way the frequency of a letter, say e, is split over more encoding letters.

Example: plain text is “Tyger! Tyger! burning bright In the forests of the night, What immortal hand or eye Dare frame thy fearful symmetry?“. Removing spaces and punctuation, making everything lowercase we obtain

```tygertygerburningbrightintheforestsofthenight whatimmortalhandoreyedareframethyfearfulsymmetry ```
the substitution encoding using “williamblake” (after throwing away repeated letters) as keyword is
```sykmqsykmqitqjcjkiqckescjsembnqmrsrnbsemjckes vewschhnqswgewjanqmymawqmbqwhmseybmwqbtgryhhmsqy```

Frequency is exactly the same, i.e. e, r and t in the source have +11%, like m q and s in the encrypted text. (e and t are the most common letters in English).

Using the Vigenère cipher, a polyalphabetic cipher, each letter is encoded using a different Caesar’s cipher shift. After choosing a keyword of length n, each group of n letters is encoded using the Caesar’s cipher where A is encoded in the corresponding keyword letter.

Example: if the keyword is “TRY” and the text is “cooler“, c is encoded using T alphabet (19 shift), o using R (17), o using Y (24), and again l is shifted by 19, e by 17 and r by 24. As you see, o is encoded using two different letters.

Back to the main example: (using “blake” as keyword)

```ujgovujgovcfrxmorbbmhstsrusepsspsdwpqtriotgrx xsadmnxobxbwhkrezrocfoabigcawiusypibcfeptjmwiucy```

Frequencies are lesser than 9%, with s being the most common letter, but s isn’t the encoding of the same letter!

How to break a Vigenère cipher? You can see it by yourself: tyger is encoded in the same way at the beginning of the text. This happen because the keyword “blake” is 5 letter length, like “tyger”.

Frequency analysis is still possible knowing the keyword length: in our example the first, sixth, eleventh letter share the same alphabet and can be analyzed like a single substitution code. Key length can be guessed checking repeated sequences in the encoded text: if the word “the”, “dog” or “tyger” is repeated after a multiple of the key length it is encoded in the same way.

If you like to test Caesar’s cipher, substitution and Vigenère by yourself you can use this Java applet on my webpage.

If you want to learn more about ciphers techniques and history, you can find tons of books on that subject. My favourite is “The Code Book” by Simon Singh.

Edit May 2012: modified applet link.

Content is released under Creative Commons License.