Benchmarking bioinformatic tools for fast and accurate eDNA metabarcoding species identification

Type Article
Date 2021-10
Language English
Author(s) Mathon LaetitiaORCID1, 2, Valentini AliceORCID2, Guérin Pierre‐edouard1, Normandeau Eric3, Noel Cyril4, Lionnet Clément5, Boulanger Emilie1, 6, Thuillier Wilfried5, Bernatchez Louis3, Mouillot David6, 7, Dejean Tony2, Manel StéphanieORCID1
Affiliation(s) 1 : CEFE, Univ. Montpellier, CNRS, EPHE‐PSL University, IRD, Univ Paul Valéry Montpellier 3 Montpellier ,France
2 : SPYGEN, 17 rue du Lac Saint‐André, Savoie Technolac 73370 Le Bourget du Lac, France
3 : Université Laval IBIS (Institut de Biologie Intégrative et des Systèmes) 1030 av. de la Médecine Québec QC G1V 0A6 ,Canada
4 : IFREMER ‐ IRSI ‐ Service de Bioinformatique (SeBiMER) 29280 Plouzané ,France
5 : Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, Laboratoire d’Écologie Alpine F‐ 38000 Grenoble ,France
6 : MARBEC, Univ. Montpellier,CNRS, IRD, Ifremer Montpellier ,France
7 : Institut Universitaire de France IUF Paris 75231, France
Source Molecular Ecology Resources (1755-098X) (Wiley), 2021-10 , Vol. 21 , N. 7 , P. 2565-2579
DOI 10.1111/1755-0998.13430
WOS© Times Cited 12
Keyword(s) benchmark, bioinformatics, eDNA, metabarcoding, sensitivity, species identification

Bioinformatic analysis of eDNA metabarcoding data is crucial toward rigorously assessing biodiversity. Many programs are now available for each step of the required analyses, but their relative abilities at providing fast and accurate species lists have seldom been evaluated.

We used simulated mock communities and real fish eDNA metabarcoding data to evaluate the performance of 13 bioinformatic programs and pipelines to retrieve fish occurrence and read abundance using the 12S mt rRNA gene marker. We used four indices to compare the outputs of each program with the simulated samples: sensitivity, F-measure, root-mean-square error (RMSE) on read relative abundances, and execution time.

We found marked differences among programs only for the taxonomic assignment step, both in terms of sensitivity, F-measure and RMSE. Running time was highly different between programs for each step. The fastest programs with best indices for each step were assembled into a pipeline. We compare this pipeline to pipelines constructed from existing toolboxes (OBITools, Barque, and QIIME 2). Our pipeline and Barque obtained the best performance for all indices and appear to be better alternatives to highly used pipelines for analyzing fish eDNA metabarcoding data with a complete reference database. Real eDNA metabarcoding data also indicated differences for taxonomic assignment and execution time only.

This study reveals major differences between programs during the taxonomic assignment step. The choice of algorithm for the taxonomic assignment can have a significant impact on diversity estimates and should be made according to the objectives of the study.

Full Text
File Pages Size Access
Author's final draft 32 1 MB Open access
54 KB Access on demand
56 KB Access on demand
2 MB Access on demand
15 1 MB Access on demand
Top of the page

How to cite 

Mathon Laetitia, Valentini Alice, Guérin Pierre‐edouard, Normandeau Eric, Noel Cyril, Lionnet Clément, Boulanger Emilie, Thuillier Wilfried, Bernatchez Louis, Mouillot David, Dejean Tony, Manel Stéphanie (2021). Benchmarking bioinformatic tools for fast and accurate eDNA metabarcoding species identification. Molecular Ecology Resources, 21(7), 2565-2579. Publisher's official version : , Open Access version :