Guidance framework to apply good practices in ecological data analysis: Lessons learned from building Galaxy-Ecology
Type | Article | ||||||||
---|---|---|---|---|---|---|---|---|---|
Acceptance Date | 2024-04-11 IN PRESS | ||||||||
Language | English | ||||||||
Author(s) | Royaux Coline1, 2, Mihoub Jean-Baptiste3, Jossé Marie4, Pelletier Dominique5, Norvez Olivier6, Reecht Yves7, Fouilloux Anne8, Rasche Helena9, Hiltemann Saskia10, Batut Bérénice11, 12, Eléaume Marc13, 14, Seguineau Pauline13, 14, Massé Guillaume15, Amossé Alan16, Bissery Claire17, 18, Lorrilliere Romain3, Martin Alexis19, Bas Yves3, 20, Virgoulay Thimothée21, 22, Chambon Valentin16, Arnaud Elie2, Michon Elisa23, Urfer Clara2, 24, Trigodet Eloïse21, 24, Delannoy Marie25, Loïs Gregoire3, Julliard Romain3, Grüning Björn26, Le Bras Yvan, The 17 Galaxy-E Community | ||||||||
Affiliation(s) | 1 : UMR8067 Biologie des Organismes et Ecosystèmes Aquatiques (BOREA, MNHN-CNRS SU-IRD-UCN-UA), Sorbonne Université, Station Marine de Concarneau - Concarneau, France 2 : Pôle national de données de biodiversité, UAR2006 PatriNat (OFB-MNHN-CNRS-IRD), Muséum National d’Histoire Naturelle, Station Marine de Concarneau - Concarneau, France 3 : Centre d’Écologie et des Sciences de la Conservation (UMR7204 CESCO, MNHN-CNRS SU), Muséum National d’Histoire Naturelle, Sorbonne Université, Centre National de la Recherche Scientifique - Paris, France 4 : Data Terra, Centre National de la Recherche Scientifique - Brest, France 5 : UMR DECOD (Ifremer-Agrocampus Ouest-INRAE) - Lorient, France 6 : Pôle National de Données de Biodiversité, UAR2006 PatriNat (OFB-MNHN-CNRS-IRD), Fondation pour la Recherche sur la Biodiversité, Muséum national d’Histoire naturelle - Paris, France 7 : Institute of Marine Research - Bergen, Norway 8 : Simula Research Laboratory - Oslo, Norway 9 : Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center - Rotterdam, The Netherlands 10 : Institute of Pharmaceutical Sciences, Faculty of Chemistry and Pharmacy, University of Freiburg - Freiburg, Germany 11 : Institut Français de Bioinformatique, CNRS UAR3601 - Évry, France 12 : Mésocentre, Clermont-Auvergne, Université Clermont Auvergne - Clermont-Ferrand, France 13 : Institut de Systématique Evolution, Biodiversité (UMR7205 ISYEB, MNHN-CNRS-SU EPHE), Département Origines et Évolution, Muséum national d’Histoire naturelle - Paris, France 14 : Institut de Systématique Evolution, Biodiversité (UMR7205 ISYEB, MNHN-CNRS-SU EPHE), Département Origines et Évolution, Station Marine de Concarneau - Concarneau, France 15 : UMR LOCEAN (CNRS-SU-IRD-MNHN), Centre National de la Recherche Scientifique, Station Marine de Concarneau - Concarneau, France 16 : Muséum National d’Histoire Naturelle, Station Marine de Concarneau - Concarneau, France 17 : Institut français de recherche pour l’exploitation de la mer (Ifremer) – Brest, France 18 : Université Claude Bernard Lyon 1 - Lyon, France 19 : UMR8067 Biologie des Organismes et Ecosystèmes Aquatiques (BOREA, MNHN-CNRS SU-IRD-UCN-UA), Muséum national d'Histoire naturelle - Paris, France 20 : UAR2006 PatriNat (OFB-MNHN-CNRS-IRD), Muséum national d’Histoire naturelle - Paris, France 21 : Centre d’Écologie et des Sciences de la Conservation (UMR7204 CESCO, MNHN-CNRS SU), Muséum National d’Histoire Naturelle - Concarneau, France 22 : Université de Montpellier - Montpellier, France 23 : Institut des Sciences de la Mer de Rimouski, Université du Québec à Rimouski - Rimouski, Québec, Canada 24 : Université de Bretagne Occidentale - Brest, France 25 : Fondation pour la Nature et l'Homme - Boulogne-Billancourt, France 26 : Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg - Freiburg, Germany |
||||||||
Source | EcoEvoRxiv (California Digital Library (CDL)) In Press | ||||||||
DOI | 10.32942/X2G033 | ||||||||
Note | This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint. | ||||||||
Keyword(s) | biodiversity, Reproducible analyses, Galaxy, Good practices, Atomisation, Generalisation, workflows, ecoinformatics, Conda, container, Common Workflow Language, RO-CRATE | ||||||||
Abstract | Numerous conceptual frameworks exist for good practices in research data and analysis (e.g. Open Science and FAIR principles). In practice, there is a need for further progress to improve transparency, reproducibility, and confidence in ecology. Here, we propose a practical and operational framework to achieve good practices for building analytical procedures based on atomisation and generalisation. We introduce the concept of atomisation to identify analytical steps which support generalisation by allowing us to go beyond single analyses. These guidelines were established during the development of the Galaxy-Ecology initiative, a web platform dedicated to data analysis in ecology. Galaxy-Ecology allows us to demonstrate a way to reach higher levels of reproducibility in ecological sciences by increasing the accessibility and reusability of analytical workflows once atomised and generalised. |
||||||||
Licence | |||||||||
Full Text |
|