First proposed by Sanders (1968), rarefaction involves selecting a specified number of samples that is equal to or less than the number of samples in the smallest sample, and then randomly discarding reads from larger samples until the number of remaining samples is equal to this threshold (see Hurlbert, 1971 for a deterministic version). Rarefaction is a method that adjusts for differences in library sizes across samples to aid comparisons of alpha diversity. The library sizes can dominate the biology in determining the result of the diversity analysis ( Lande, 1996). I am likely to observe higher numbers of different taxa in the sample with more microbial reads. I then take a sample from Environment B, count the number of different taxa in that sample, and compare it to the number of taxa in Environment A. Suppose I conduct an experiment in which I take a sample from Environment A and count the number of different microbial taxa present in my sample. To illustrate, consider the following example where the alpha diversity metric of interest is strain-level richness of a microbial community (the total number of strain variants present in the environment).
Unfortunately, determining how to meaningfully estimate and compare alpha diversity is not trivial. In microbial ecology, analyzing the alpha diversity of amplicon sequencing data is a common first approach to assessing differences between environments. Because many perturbations to a community affect the alpha diversity of a community, summarizing and comparing community structure via alpha diversity is a ubiquitous approach to analyzing community surveys. Alpha diversity metrics summarize the structure of an ecological community with respect to its richness (number of taxonomic groups), evenness (distribution of abundances of the groups), or both.