Jaccard index

The Jaccard index refers to binary similarity coefficients and is a commonly used measure for analyzing the species composition of communities or samples in biological studies. This similarity measure is the intersection of sets. The calculator is designed primarily for biologists, but is universal for estimating the intersection of sets and can be applied to other, non-biological problems.

This page exists due to the efforts of the following people:




Created: 2023-11-13 12:16:51, Last updated: 2023-11-13 12:29:28
Creative Commons Attribution/Share-Alike License 3.0 (Unported)

This content is licensed under Creative Commons Attribution/Share-Alike License 3.0 (Unported). That means you may freely redistribute or modify this content under the same license conditions and must attribute the original author by placing a hyperlink from your site to this work https://planetcalc.com/9651/. Also, please do not modify any references to the original work (if any) contained in this content.

It should be noted that the Jaccard coefficient is similar to the Tanimoto similarity coefficient, but it is the Jaccard index that is used in biological studies. When the task is not only to compare selected samples, but also to analyze one's own data in comparison with the data of other researchers, it makes sense to use the coefficients used by other researchers. The sensitivity of the Jaccard coefficient is discussed in the work of Maximov and Kuznetsova (2013)1, where the reference similarity of samples in relation to the Jaccard index is justified.

PLANETCALC, Calculation of the Jaccard index for a series of samples

Calculation of the Jaccard index for a series of samples

The row and column from which the matrix begins, with rows corresponding to species and columns to samples. The contents of the cells to the left and above are interpreted as the titles of the corresponding rows and columns. By default, the cell in the second column of the second row is considered to be the start of the data.
Digits after the decimal point: 2

The first data cell

The file is very large. Browser slowdown may occur during loading and creation.

Typically, species lists in samples, communities, test sites, etc. are analyzed using binary data, with species presence denoted by 1 and absence by 0 for ease of calculation.

This index was proposed by Paul Jacquard in 19012.

The coefficient is expressed by a formula denoting the ratio of the number of species found in the two study sites to the sum of species found in site A but not found in site B, and species found in site B but absent from site A. That is, mismatched pairs summed in the denominator are weighted twice as much as matched pairs.

K_{J}={\frac {c}{a+b-c}}
where a is the number of species at the first sample site, b is the number of species at the second sample site, c is the number of species common to the 1st and 2nd sites.

In case of complete species mismatch, the coefficient is 0, and in case of complete overlap it is 1.
In this calculator, data entry is assumed to be in the form of a data table (matrix), with values entered as binary 1 or 0.

  1. Maksimov V.N., Kuznetsova N.A. Similarity standard: use in comparing the composition and structure of communities. Moscow: Partnership of scientific publications KMK. 2013. 89 с. 

  2. Jaccard P. Distribution de la flore alpine dans le Bassin des Dranses et dans quelques regions voisines // Bull. Soc. Vaudoise sci. Natur. 1901. V. 37. Bd. 140. P. 241-272. 

URL copied to clipboard
PLANETCALC, Jaccard index