Guidance framework to apply good practices in ecological data analysis: Lessons learned from building Galaxy-Ecology

Type Article
Acceptance Date 2024-04-11 IN PRESS
Language English
Author(s) Royaux Coline1, 2, Mihoub Jean-Baptiste3, Jossé Marie4, Pelletier DominiqueORCID5, Norvez Olivier6, Reecht Yves7, Fouilloux Anne8, Rasche Helena9, Hiltemann Saskia10, Batut Bérénice11, 12, Eléaume Marc13, 14, Seguineau Pauline13, 14, Massé Guillaume15, Amossé Alan16, Bissery Claire17, 18, Lorrilliere Romain3, Martin Alexis19, Bas Yves3, 20, Virgoulay Thimothée21, 22, Chambon Valentin16, Arnaud Elie2, Michon Elisa23, Urfer Clara2, 24, Trigodet Eloïse21, 24, Delannoy Marie25, Loïs Gregoire3, Julliard Romain3, Grüning Björn26, Le Bras Yvan, The 17 Galaxy-E Community
Affiliation(s) 1 : UMR8067 Biologie des Organismes et Ecosystèmes Aquatiques (BOREA, MNHN-CNRS SU-IRD-UCN-UA), Sorbonne Université, Station Marine de Concarneau - Concarneau, France
2 : Pôle national de données de biodiversité, UAR2006 PatriNat (OFB-MNHN-CNRS-IRD), Muséum National d’Histoire Naturelle, Station Marine de Concarneau - Concarneau, France
3 : Centre d’Écologie et des Sciences de la Conservation (UMR7204 CESCO, MNHN-CNRS SU), Muséum National d’Histoire Naturelle, Sorbonne Université, Centre National de la Recherche Scientifique - Paris, France
4 : Data Terra, Centre National de la Recherche Scientifique - Brest, France
5 : UMR DECOD (Ifremer-Agrocampus Ouest-INRAE) - Lorient, France
6 : Pôle National de Données de Biodiversité, UAR2006 PatriNat (OFB-MNHN-CNRS-IRD), Fondation pour la Recherche sur la Biodiversité, Muséum national d’Histoire naturelle - Paris, France
7 : Institute of Marine Research - Bergen, Norway
8 : Simula Research Laboratory - Oslo, Norway
9 : Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center - Rotterdam, The Netherlands
10 : Institute of Pharmaceutical Sciences, Faculty of Chemistry and Pharmacy, University of Freiburg - Freiburg, Germany
11 : Institut Français de Bioinformatique, CNRS UAR3601 - Évry, France
12 : Mésocentre, Clermont-Auvergne, Université Clermont Auvergne - Clermont-Ferrand, France
13 : Institut de Systématique Evolution, Biodiversité (UMR7205 ISYEB, MNHN-CNRS-SU EPHE), Département Origines et Évolution, Muséum national d’Histoire naturelle - Paris, France
14 : Institut de Systématique Evolution, Biodiversité (UMR7205 ISYEB, MNHN-CNRS-SU EPHE), Département Origines et Évolution, Station Marine de Concarneau - Concarneau, France
15 : UMR LOCEAN (CNRS-SU-IRD-MNHN), Centre National de la Recherche Scientifique, Station Marine de Concarneau - Concarneau, France
16 : Muséum National d’Histoire Naturelle, Station Marine de Concarneau - Concarneau, France
17 : Institut français de recherche pour l’exploitation de la mer (Ifremer) – Brest, France
18 : Université Claude Bernard Lyon 1 - Lyon, France
19 : UMR8067 Biologie des Organismes et Ecosystèmes Aquatiques (BOREA, MNHN-CNRS SU-IRD-UCN-UA), Muséum national d'Histoire naturelle - Paris, France
20 : UAR2006 PatriNat (OFB-MNHN-CNRS-IRD), Muséum national d’Histoire naturelle - Paris, France
21 : Centre d’Écologie et des Sciences de la Conservation (UMR7204 CESCO, MNHN-CNRS SU), Muséum National d’Histoire Naturelle - Concarneau, France
22 : Université de Montpellier - Montpellier, France
23 : Institut des Sciences de la Mer de Rimouski, Université du Québec à Rimouski - Rimouski, Québec, Canada
24 : Université de Bretagne Occidentale - Brest, France
25 : Fondation pour la Nature et l'Homme - Boulogne-Billancourt, France
26 : Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg - Freiburg, Germany
Source EcoEvoRxiv (California Digital Library (CDL)) In Press
DOI 10.32942/X2G033
Note This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Keyword(s) biodiversity, Reproducible analyses, Galaxy, Good practices, Atomisation, Generalisation, workflows, ecoinformatics, Conda, container, Common Workflow Language, RO-CRATE
Abstract

Numerous conceptual frameworks exist for good practices in research data and analysis (e.g. Open Science and FAIR principles). In practice, there is a need for further progress to improve transparency, reproducibility, and confidence in ecology. Here, we propose a practical and operational framework to achieve good practices for building analytical procedures based on atomisation and generalisation. We introduce the concept of atomisation to identify analytical steps which support generalisation by allowing us to go beyond single analyses. These guidelines were established during the development of the Galaxy-Ecology initiative, a web platform dedicated to data analysis in ecology. Galaxy-Ecology allows us to demonstrate a way to reach higher levels of reproducibility in ecological sciences by increasing the accessibility and reusability of analytical workflows once atomised and generalised.

Licence CC-BY
Full Text
File Pages Size Access
Preprint 34 4 MB Open access
Top of the page

How to cite 

Royaux Coline, Mihoub Jean-Baptiste, Jossé Marie, Pelletier Dominique, Norvez Olivier, Reecht Yves, Fouilloux Anne, Rasche Helena, Hiltemann Saskia, Batut Bérénice, Eléaume Marc, Seguineau Pauline, Massé Guillaume, Amossé Alan, Bissery Claire, Lorrilliere Romain, Martin Alexis, Bas Yves, Virgoulay Thimothée, Chambon Valentin, Arnaud Elie, Michon Elisa, Urfer Clara, Trigodet Eloïse, Delannoy Marie, Loïs Gregoire, Julliard Romain, Grüning Björn, Le Bras Yvan, The 17 Galaxy-E Community. Guidance framework to apply good practices in ecological data analysis: Lessons learned from building Galaxy-Ecology. EcoEvoRxiv IN PRESS. Publisher's official version : https://doi.org/10.32942/X2G033 , Open Access version : https://archimer.ifremer.fr/doc/00887/99844/