Index of Coincidence

This online calculator calculates index of coincidence (IC, IOC) for the given text

Creative Commons Attribution/Share-Alike License 3.0 (Unported)

This content is licensed under Creative Commons Attribution/Share-Alike License 3.0 (Unported). That means you may freely redistribute or modify this content under the same license conditions and must attribute the original author by placing a hyperlink from your site to this work Also, please do not modify any references to the original work (if any) contained in this content.

Here is the calculator, which calculates the index of coincidence, or IOC (IC) for the given text. The index of coincidence is the probability of two randomly selected letters being equal. This metric was first proposed by William F. Friedman in 1922 in Revierbank Publication No. 22 titled "The Index of Coincidence and Its Applications in Cryptography". In 1967, the historian David Kahn wrote

Revierbank Publication No. 22, written in 1920, when Friedman was 28, must be regarded as the most important single publication in cryptography. It took the science into a new world. 1

Having the definition above, one can devise the formula for IOC.
Let N be the length of the text.
Let n be the size of the alphabet.
Let a_i be the i-th letter of the alphabet.
Let F_i be the number of occurences of i-th letter in the text.

Then the probability of having two a_i selected is p_i=\frac{F_i*(F_i-1)}{N*(N-1)}
The total probability (which is the IOC) is the sum of probabilities for each letter:

Note that sometimes IOC is "normalized". This is usually done by multiplying the result by n - size of the alphabet.

The calculator below parses the text and calculates the IOC using the formulas above. You can also read why it is so important below the calculator.

PLANETCALC, Index of Coincidence

Index of Coincidence

Digits after the decimal point: 4
Index of Coincidence
Normalized Index of Coincidence

Why Index of Coincidence is so important?

It is important, because we can calculate expected index of coincidence for given language using language's frequency of letters. With the letter frequency as p_i we can approximate the F_i as p_i*N. Which gives us the following:
IOC_{expected}=\frac{1}{N*(N-1)}*\sum^{n}_{i=1}F_i*(F_i-1)\\=\frac{1}{N*(N-1)}*\sum^{n}_{i=1}(p_i*N)*(p_i * N - 1)\\=\sum^{n}_{i=1}p_i*\frac{p_i*N-1}{N-1}
If N is large enough, we can approximate the fraction \frac{p_i*N-1}{N-1} as p_i, which gives us

We can also calculate expected index of coincidence for completely random text - there all the letters have equal frequency 1/n. It is indeed 1/n.

Having expected index of coincidence, you can quickly estimate ciphered text, if you suspect that it was produced by one of the "classical" ciphers. If the index of coincidence is high and close to the expected IC for the language, then the text probably was encrypted using transposition cipher or simple (monoalphabetic) substitution cipher. Otherwise, if the index of coincidence is low and close to the expected IC for random text, then the text probably was encrypted using polyalphabetic cipher.

According to Wikipedia,

The index of coincidence is useful both in the analysis of natural-language plaintext and in the analysis of ciphertext (cryptanalysis). Even when only ciphertext is available for testing and plaintext letter identities are disguised, coincidences in ciphertext can be caused by coincidences in the underlying plaintext. This technique is used to cryptanalyze the Vigenère cipher, for example. For a repeating-key polyalphabetic cipher arranged into a matrix, the coincidence rate within each column will usually be highest when the width of the matrix is a multiple of the key length, and this fact can be used to determine the key length, which is the first step in cracking the system. Coincidence counting can help determine when two texts are written in the same language using the same alphabet. (This technique has been used to examine the purported Bible code). The causal coincidence count for such texts will be distinctly higher than the accidental coincidence count for texts in different languages, or texts using different alphabets, or gibberish texts.2

  1. David Kahn, The Code Breakers, Macmillan, 1967. 

  2. Index of Coincidence