Finding the right fit: Comparative cetacean distribution models using multiple data sources and statistical approaches

Type Article
Date 2018-11
Language English
Author(s) Derville Solene1, 2, 3, 4, Torres Leigh G.3, Iovan Corina1, Garrigue ClaireORCID1, 4
Affiliation(s) 1 : UMR ENTROPIE (IRD, Université de La Réunion, CNRS), Nouméa Cedex, New Caledonia
2 : 2Collège Doctoral, Sorbonne Université, Paris, France
3 : Department of Fisheries and Wildlife, Marine Mammal Institute, Oregon State University, HMSC, Newport, OR, USA
4 : Operation Cétacés, Nouméa, New Caledonia
Source Diversity And Distributions (1366-9516) (Wiley), 2018-11 , Vol. 24 , N. 11 , P. 1657-1673
DOI 10.1111/ddi.12782
WOS© Times Cited 56
Keyword(s) citizen science, generalized regression, humpback whales, machine learning, species distribution models, support vector machines
Abstract Aim: Accurate predictions of cetacean distributions are essential to their conservation but are limited by statistical challenges and a paucity of data. This study aimed at comparing the capacity of various statistical algorithms to deal with biases commonly found in nonsystematic cetacean surveys and to evaluate the potential for citizen science data to improve habitat modelling and predictions. An endangered population of humpback whales (Megaptera novaeangliae) in their breeding ground was used as a case study. Location: New Caledonia, Oceania. Methods: Five statistical algorithms were used to model the habitat preferences of humpback whales from 1,360 sightings collected over 14 years of nonsystematic research surveys. Three different background sampling approaches were tested when developing models from 625 crowdsourced sightings to assess methods accounting for citizen science spatial sampling bias. Model evaluation was conducted through cross-validation and prediction to an independent satellite tracking dataset. Results: Algorithms differed in complexity of the environmental relationships modelled, ecological interpretability and transferability. While parameter tuning had a great effect on model performances, GLMs generally had low predictive performance, SVMs were particularly hard to interpret, and BRTs had high descriptive power but showed signs of overfitting. MAXENT and especially GAMs provided a valuable complexity trade-off, accurate predictions and were ecologically intelligible. Models showed that humpback whales favoured cool (22-23 degrees C) and shallow waters (0-100 m deep) in coastal as well as offshore areas. Citizen science models converged with research survey models, specifically when accounting for spatial sampling bias. Main conclusions: Marine megafauna distribution models present specific challenges that may be addressed through integrative evaluation, independent testing and appropriately tuned statistical algorithms. Specifically, controlling overfitting is a priority when predicting cetacean distributions for large-scale conservation perspectives. Citizen science data appear to be a powerful tool to describe cetacean habitat.
Full Text
File Pages Size Access
Publisher's official version 17 1 MB Open access
Appendix S1 10 4 MB Open access
Appendix S2 20 560 KB Open access
Appendix S3 8 6 MB Open access
Appendix S4 7 3 MB Open access
Top of the page

How to cite 

Derville Solene, Torres Leigh G., Iovan Corina, Garrigue Claire (2018). Finding the right fit: Comparative cetacean distribution models using multiple data sources and statistical approaches. Diversity And Distributions, 24(11), 1657-1673. Publisher's official version : https://doi.org/10.1111/ddi.12782 , Open Access version : https://archimer.ifremer.fr/doc/00860/97181/