Xact analysis

12/28/2023

This data set dependence is not just a practical concern: the delineation of de novo OTUs depends on the relative abundances of the sampled community even in the limit of infinite sequencing depth and zero errors. Many methods for constructing these clusters have been developed, but in all cases de novo OTUs are emergent features of a data set, with boundaries and membership that depend on the data set in which they are defined. However, we argue here that the more important, and overlooked, advantage of ASVs is that they combine the benefits for subsequent analysis of closed-reference and de novo OTUs: ASVs are reusable across studies, reproducible in future data sets and are not limited by incomplete reference databases.ĭe novo OTUs are constructed by clustering sequencing reads that are sufficiently similar to one another. The higher resolution afforded by ASV methods has self-evident benefits-for example, it is clearly useful to distinguish Neisseria gonorrhoeae from the many other Neisseria species commonly found in the human microbiota-and initial evaluation has focused on that improved resolution. ASV methods have demonstrated sensitivity and specificity as good or better than OTU methods and better discriminate ecological patterns ( Eren et al., 2013 Eren et al., 2015 Callahan et al., 2016a Needham et al., 2017). A similar class of methods developed for 454-scale data was typically used to ‘denoise’ sequencing data prior to constructing OTUs ( Quince et al., 2011), while new ASV methods are explicitly intended to replace OTUs as the atomic unit of analysis. ASV methods infer the biological sequences in the sample prior to the introduction of amplification and sequencing errors, and distinguish sequence variants differing by as little as one nucleotide. Recently, new methods have been developed that resolve amplicon sequence variants (ASVs) from Illumina-scale amplicon data without imposing the arbitrary dissimilarity thresholds that define molecular OTUs ( Eren et al., 2013 Tikhonov et al., 2015 Eren et al., 2015 Callahan et al., 2016a Edgar, 2016 Amir et al., 2017). Many methods for defining molecular OTUs have been proposed, but the most substantive distinction is between closed-reference methods-in which reads sufficiently similar to a sequence in a reference database are recruited into a corresponding OTU-and de novomethods-in which reads are grouped into OTUs as a function of their pairwise sequence similarities. The sample-by-OTU feature table serves as the basis for further analysis, with the observation of an OTU often treated as akin to the observation of a ‘species’ in the taxonomic profiling application. The analysis of marker-gene data customarily begins with the construction of molecular operational taxonomic units (OTUs): clusters of reads that differ by less than a fixed sequence dissimilarity threshold, most commonly 3% ( Westcott and Schloss, 2015 Kopylova et al., 2016). Increasing use of marker-gene sequencing has been accompanied by increasing data set sizes this year, we expect thousands of marker-gene studies to generate millions to billions of sequencing reads each.

High-throughput sequencing of PCR-amplified marker genes has grown explosively over the past decade, especially as a means of taxonomically profiling microbial communities. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that ASVs should replace OTUs as the standard unit of marker-gene analysis and reporting. Here we discuss how these features grant ASVs the combined advantages of closed-reference OTUs-including computational costs that scale linearly with study size, simple merging between independently processed data sets, and forward prediction-and of de novo OTUs-including accurate measurement of diversity and applicability to communities lacking deep coverage in reference databases. Less obvious, but we believe more important, are the broad benefits that derive from the status of ASVs as consistent labels with intrinsic biological meaning identified independently from a reference database.

The benefits of finer resolution are immediately apparent, and arguments for ASV methods have focused on their improved resolution. New methods control errors sufficiently such that amplicon sequence variants (ASVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. Recent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold.

0 Comments

Xact analysis

Leave a Reply.

Author

Archives

Categories