Index of Coincidence

This online calculator calculates index of coincidence (IC, IOC) for the given text

This page exists due to the efforts of the following people:

Timur
Timur

Here is the calculator, which calculates the index of coincidence, or IOC (IC) for the given text. You can read what is index of coincidence and how it is calculated below the calculator.

PLANETCALC, Index of Coincidence

Index of Coincidence

Digits after the decimal point: 4
Index of Coincidence
 
Normalized Index of Coincidence
 

The index of coincidence

The index of coincidence is the probability of two randomly selected letters being equal. This metric was first proposed by William F. Friedman in 1922 in Revierbank Publication No. 22 titled "The Index of Coincidence and Its Applications in Cryptography". In 1967, the historian David Kahn wrote

Revierbank Publication No. 22, written in 1920, when Friedman was 28, must be regarded as the most important single publication in cryptography. It took the science into a new world. 1

Having the definition above, one can devise the formula for IOC.
Let N be the length of the text.
Let n be the size of the alphabet.
Let a_i be the i-th letter of the alphabet.
Let F_i be the number of occurences of i-th letter in the text.

Then the probability of having two a_i selected is p_i=\frac{F_i*(F_i-1)}{N*(N-1)}
The total probability (which is the IOC) is the sum of probabilities for each letter:
IOC=\frac{1}{N*(N-1)}*\sum^{n}_{i=1}F_i*(F_i-1)

Note that sometimes IOC is "normalized". This is usually done by multiplying the result by n - size of the alphabet.
IOC_{normalized}=\frac{n}{N*(N-1)}*\sum^{n}_{i=1}F_i*(F_i-1)

The calculator below parses the text and calculates the IOC using the formulas above. You can also read why it is so important below the calculator.

Why Index of Coincidence is so important?

It is important, because we can calculate expected index of coincidence for given language using language's frequency of letters. With the letter frequency as p_i we can approximate the F_i as p_i*N. Which gives us the following:
IOC_{expected}=\frac{1}{N*(N-1)}*\sum^{n}_{i=1}F_i*(F_i-1)\\=\frac{1}{N*(N-1)}*\sum^{n}_{i=1}(p_i*N)*(p_i * N - 1)\\=\sum^{n}_{i=1}p_i*\frac{p_i*N-1}{N-1}
If N is large enough, we can approximate the fraction \frac{p_i*N-1}{N-1} as p_i, which gives us
IOC_{expected}=\sum^{n}_{i=1}p_i^2

We can also calculate expected index of coincidence for completely random text - there all the letters have equal frequency 1/n. It is indeed 1/n.

Having expected index of coincidence, you can quickly estimate ciphered text, if you suspect that it was produced by one of the "classical" ciphers. If the index of coincidence is high and close to the expected IC for the language, then the text probably was encrypted using transposition cipher or simple (monoalphabetic) substitution cipher. Otherwise, if the index of coincidence is low and close to the expected IC for random text, then the text probably was encrypted using polyalphabetic cipher.

According to Wikipedia,

The index of coincidence is useful both in the analysis of natural-language plaintext and in the analysis of ciphertext (cryptanalysis). Even when only ciphertext is available for testing and plaintext letter identities are disguised, coincidences in ciphertext can be caused by coincidences in the underlying plaintext. This technique is used to cryptanalyze the Vigenère cipher, for example. For a repeating-key polyalphabetic cipher arranged into a matrix, the coincidence rate within each column will usually be highest when the width of the matrix is a multiple of the key length, and this fact can be used to determine the key length, which is the first step in cracking the system. Coincidence counting can help determine when two texts are written in the same language using the same alphabet. (This technique has been used to examine the purported Bible code). The causal coincidence count for such texts will be distinctly higher than the accidental coincidence count for texts in different languages, or texts using different alphabets, or gibberish texts.2


  1. David Kahn, The Code Breakers, Macmillan, 1967. 

  2. Index of Coincidence 

Comments