FN Archimer Export Format PT J TI A new method to control error rates in automated species identification with deep learning algorithms BT AF Villon, Sébastien Mouillot, David Chaumont, Marc Subsol, Gérard Claverie, Thomas Villéger, Sébastien AS 1:1,2;2:1,5;3:2,3;4:2;5:1,4;6:1; FF 1:;2:;3:;4:;5:;6:; C1 MARBEC, Univ of Montpellier, CNRS, IRD, Ifremer, Montpellier, France Research-Team ICAR, LIRMM, Univ of Montpellier, CNRS, Montpellier, France University of Nîmes, Nîmes, France CUFR Mayotte, Dembeni, France Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia C2 UNIV MONTPELLIER, FRANCE UNIV MONTPELLIER, FRANCE UNIV NIMES, FRANCE UNIV MAYOTTE, FRANCE UNIV JAMES COOK, AUSTRALIA UM MARBEC IN WOS Cotutelle UMR DOAJ copubli-france copubli-univ-france copubli-int-hors-europe IF 4.379 TC 23 UR https://archimer.ifremer.fr/doc/00640/75244/75406.pdf https://archimer.ifremer.fr/doc/00640/75244/75407.docx LA English DT Article AB Processing data from surveys using photos or videos remains a major bottleneck in ecology. Deep Learning Algorithms (DLAs) have been increasingly used to automatically identify organisms on images. However, despite recent advances, it remains difficult to control the error rate of such methods. Here, we proposed a new framework to control the error rate of DLAs. More precisely, for each species, a confidence threshold was automatically computed using a training dataset independent from the one used to train the DLAs. These species-specific thresholds were then used to post-process the outputs of the DLAs, assigning classification scores to each class for a given image including a new class called “unsure”. We applied this framework to a study case identifying 20 fish species from 13,232 underwater images on coral reefs. The overall rate of species misclassification decreased from 22% with the raw DLAs to 2.98% after post-processing using the thresholds defined to minimize the risk of misclassification. This new framework has the potential to unclog the bottleneck of information extraction from massive digital data while ensuring a high level of accuracy in biodiversity assessment. PY 2020 PD JUN SO Scientific Reports SN 2045-2322 PU Springer Science and Business Media LLC VL 10 IS 1 UT 000546533700003 DI 10.1038/s41598-020-67573-7 ID 75244 ER EF