FN Archimer Export Format PT J TI Effects of Ignoring Survey Design Information for Data Reuse BT AF Foster, Scott D. Vanhatalo, Jarno Trenkel, Verena Schulz, Torsti Lawrence, Emma Przeslawski, Rachel Hosack, Geoffrey R. AS 1:1;2:2,3;3:4;4:3;5:5;6:6;7:1; FF 1:;2:;3:PDG-RBE-EMH;4:;5:;6:;7:; C1 Data61 ,CSIRO ,Hobart Tasmania ,Australia Department of Mathematics and Statistics, University of Helsinki, Helsinki ,Finland Organismal and Evolutionary Biology Research Program, University of Helsinki ,Helsinki, Finland Ifremer, Nantes, France Data61 ,CSIRO ,Brisbane Queensland, Australia Geoscience Australia, Canberra ACT, Australia C2 CSIRO, AUSTRALIA UNIV HELSINKI, FINLAND UNIV HELSINKI, FINLAND IFREMER, FRANCE CSIRO, AUSTRALIA GEOSCIENCE AUSTRALIA, AUSTRALIA SI NANTES SE PDG-RBE-EMH IN WOS Ifremer UPR copubli-europe copubli-int-hors-europe IF 6.105 TC 7 UR https://archimer.ifremer.fr/doc/00691/80339/83422.pdf https://archimer.ifremer.fr/doc/00691/80339/83423.pdf https://archimer.ifremer.fr/doc/00691/80339/83424.zip LA English DT Article DE ;bias;data;database;findable;accessible;interoperable;reusable data;Horvitz-Thompson estimator;inclusion probability;model;population density estimate;reuse;survey design AB Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: “Are aggregated databases currently providing the right information to enable effective and unbiased reuse?” We investigate this question, with a focus on designs that purposefully favour the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating datasets with progressively more uneven inclusion probabilities, and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g. inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse. PY 2021 PD SEP SO Ecological Applications SN 1051-0761 PU Wiley VL 31 IS 6 UT 000667599600001 DI 10.1002/eap.2360 ID 80339 ER EF