A quantitative review of abundance-based species distribution models

The contributions of species to ecosystem functions or services depend not only on their presence in a given community, but also on their local abundance. Progress in predictive spatial modelling has largely focused on species occurrence, rather than abundance. As such, limited guidance exists on the most reliable methods to explain and predict spatial variation in abundance. We analysed the performance of 68 abundance-based species distribution models fitted to 800,000 standardised abundance records for more than 800 terrestrial bird and reef fish species. We found high heterogeneity in performance of abundance-based models. While many models performed poorly, a subset of models consistently reconstructed range-wide abundance patterns. The best predictions were obtained using random forests for frequently encountered and abundant species, and for predictions within the same environmental domain as model calibration. Extending predictions of species abundance outside of the environmental conditions used in model training generated poor predictions. Thus, interpolation of abundances between observations can help improve understanding of spatial abundance patterns, but extrapolated predictions of abundance, e.g. under climate change, have a much greater uncertainty. Our synthesis provides a roadmap for modelling abundance patterns, a key property of species’ distributions that underpins theoretical and applied questions in ecology and conservation.

Here, we aim to provide practical guidance on applying statistical approaches to 150 predict species' abundance, and identify factors most affecting predictive 151 performance. We compare 68 abundance-based species distribution models fitted 152 for two standardised abundance datasets containing more than 800 marine and 153 terrestrial vertebrate species and over 800,000 abundance observations. We test 154 model interpolative (within-sample) and extrapolative (out-of-sample) performance. 155 We ask how statistical framework and model complexity, and species' and data 156 characteristics, affect metrics of model accuracy, discrimination, and precision. We Population and patch extinction risk is better predicted by patch abundance rather than occupancy alone. Schulz et al. (2020) show abundance in the previous year to be a strong predictor of Glanville fritillary (Melitaea cinxia) butterfly patch occupancy, such that local abundance rather than average abundance determines local extinction risks. If using a fixed focal area for surveys, species' environmental response curves are better quantified using abundance, which provides more information than presence-absence.
Becker et al. (2019) modelled the influx of cetacean individuals to the California current system, using generalised additive models, during a heatwave event of 2014.
Quantitative changes in abundance within a species range are more informative that occurrence shifts (i.e., intermediate stages in range shifts, no change in range extent). Fei et al. (2017) found that shifts in the spatial distribution of species' abundance for tree species in the United States from, 1980s to 2010s, was mostly due to sub-populations increasing in density from low initial abundance.
Abundance is more sensitive at detecting impacts on species' distributions than occurrence. Maxwell et al. (2019) synthesised 698 studied responses to extreme weather events and showed that abundance declines occurred in 100 cases, but local extinction occurred in only 31 cases. Ricart et al. (2018) show that habitat forming Codium vermilara algae in the north west Mediterranean has declined by 95% in terms of abundance but only 45% in terms of site occupancy.
Trends in abundance and species richness can be disconnected. Antão et al. (2020a) found contrasting patterns in assemblage abundance and species richness in Finnish moth assemblages over 19 years, with abundance declining despite species richness increasing.

Ecosystem function and services
Individuals contribute to ecosystem services rather than species. Winfree et al. (2015) found that, in real-world ecosystems, crop pollination was driven by abundance fluctuations of dominant bee species whereas species richness was driven by rare species that contributed little to ecosystem function.
Interaction strengths depend on the abundance of interacting species.
Matías et al. (2019) document how pathogen abundance determines Cork oak (Quercus suber) mortality rates across the species' distribution. More generally, Vázquez et al. (2007) show that asymmetry in interaction strength between hosts and consumers is correlated with abundance, so that rarer species are more negatively affected by abundant partners but pairs of interacting abundant species exhibit reciprocally strong effects. Geographic differences in patterns in evenness in abundance exist, such that the contributions of individuals and species to assemblage functional diversity varies at a macroecological scale.
Stuart-Smith et al. (2013) show that community evenness is higher in temperate reef fish assemblages, compared to tropical assemblages. This difference in assemblage evenness suggest that each fish species contribution to reef ecosystem functioning is higher in temperate than tropical regions. Productivity depends on number of individuals in an area, which can map differently to the area suitable for occupancy. Kallasvuo et al. (2017) demonstrate that the most productive areas, with most individuals, only occupy a small area of the total suitable region for fish stocks in the Baltic Sea.

Management of biodiversity
Management goals are often to maintain abundance (biomass) of individuals rather than just presence Hutchings and Reynolds (2004) show breeding population sizes of economically valuable fishes have declined by 83%, undermining profitable fisheries, even though small populations still persist.
Extinction risk is often established based on population abundance change, which can be spatially variable Sherley et al. (2020) use 40 years of count data of African penguin (Spheniscus demersus) and model spatially dependent abundance change through time to identify regions in the geographic range at high risk of extinction. The overall decline in abundance was 65% since 1989, indicating that the threshold for the IUCN 'Endangered' Red List category had been crossed. Spatial mapping of abundance for prioritization of area of conservation Flores et al. (2018) show how valley areas are important for maintaining high populations of Guanaco (Lama guanicao) in central Tierra del Fuego, and that spatial heterogeneity of abundance is greater in the breeding that non-breeding season.
Invader impact curves suggest impacts are threshold dependent. Yokomizo et al. (2009) simulations indicate that impacts of invasive species depend on density, and that density-impact curve must be correctly identified to prevent overinvestment in management with little reduction in impact, particularly for species whose impact is only realised at high densities.

165
We obtained standardised estimates of species abundance across large regions for 169 birds and shallow-water reef fishes from the Breeding Bird Survey of the USA (BBS) 170 and Reef Life Survey (RLS) respectively (for detailed sampling schemes see species without full scientific names and fewer than 50 abundance records. We 190 required species absences for two-stage models and abundance-absence models. 191 We generated absences for each species by taking observations where species 192 were present and finding all observations within a 1000 km buffer where species 193 were not present. A lack of observed presence is not necessarily a 'true absence ', 194 but instead suggests species were undetectable with a reasonable sampling effort  only variables with expected a priori relationships with abundance (see Table S1 for predictive power to ensure our results were more robust to potential multicollinearity. 209 We ran a separate robust-PCA on 19 variables characterising climates across the  Table S1). For each dataset, we retained 3 principal components, 214 explaining 87.8% and 77.8% variation respectively, and used these principal 215 component scores as predictor variables to summarise the dominant climate and 216 biogeochemical regimes of the data in each set of models (3 PCA variables for birds 217 and 3 PCA variables for fishes; Figure S1 and S2). We retained the PCA axes which 218 explained >5% variation in the PCA-covariate set, which resulted in 3 axes 219 summarising the climatological variation for each dataset. In addition to the 220 climatological variables, we also included additional environmental variables as 221 predictors in our model that we expected to act independently. All non-PCA variables 222 were mean-centred, normalised to a variance of 1, and transformed according to 223 Table S1 before modelling.

225
Analytical design 226 We analysed a large diversity of species abundance models that spanned a gradient  Table S2 for full model list). Combining 236 models and cross-validations for 1,547 species led to 59,840 models to evaluate. Our full species abundance model set comprises different statistical algorithms, 239 response transformations, error distributions, and formulations of abundance data. 240 We used 24 model variants from common statistical distributions and 241 transformations for abundance data that were available within statistical software 242 packages in R (e.g., Poisson, negative binomial, zero-inflated, tweedie, multi-243 nominal, log10-gaussian, log-gaussian; Table S2). We chose statistical treatments of 244 abundance data that are common in the literature and valid to the error distribution of 245 abundance. We fitted these 24 model variants using four statistical model fitting abundance of a species well (high accuracy) but poorly discriminate between high 309 and low abundances (low discrimination). We focused our results mostly on 310 discrimination because identifying changes in spatial and temporal variation in 311 abundance, a goal of conservation and wildlife management, depends on good 312 discrimination of abundance values between sites or time-points. Further, accuracy 313 and precision may depend on the quality of sampling, but inaccurate sampling may 314 still provide reasonable estimates of spatial and temporal differences in abundance. 315 We identified an 'optimal model' based on the most discriminatory model for each 316 species. To do so, we rescaled the four discrimination metrics between 0-1, 317 averaged the score across the scaled metrics, and identified the model with the 318 highest average score per species -we report this as the 'optimal model' 319 throughout.

541
Species' and data characteristics 542 The variation in model performance explained by species and data characteristics 543 varied among performance metrics, and was higher in general for within-sample (R 2 544 = 0.04 -0.44) compared to out-of-sample cross-validations (R 2 = 0.01 -0.33; Table   545 S5 -S8). All six evaluation metrics were affected by species or data characteristics 546 in both birds and fishes (Table S5 - (Table S5 to  Successful aspects of species abundance models 587 A small number of good approaches for predicting species abundance emerged after 588 exploring a large set of models. Correlation values from our optimal models were 589 higher than ~0.3 for more than 75% of species, and higher than ~0.6 for 25% of 590 species (Table 2) Our finding that modelling abundance directly was better than an indirect approach 621 (i.e., comparing our abundance-absence models to two-stage models) for more than 622 80% of species indicates that spatial abundance and occurrence patterns are    Table 1). We argue that spatial abundance models can provide critical biodiversity information with the potential to improve the ecological relevance and 769 species conservation applications of species distribution models.