The Pangeo Big Data Ecosystem and its use at CNES

Pangeo[1] is a community-driven effort for open-source big data initially focused on the Earth System Sciences. One of its primary goals is to enable scientists in analyzing petascale datasets both on classical high-performance computing (HPC) and on public cloud infrastructure. In only a few years, Pangeo has grown into a very productive community collaborating on the development of open-source analysis tools for science. It provides a set of example deployments based on open-source Scientific Python packages like Jupyter[2], Dask[3], and Xarray[4] that bring together scientists and developer with their actual use-cases. In this paper, we first describe Pangeo ecosystem and community. We then present its impact on the work of scientists from CNES on the HPC deployment there. We conclude with a future outlook for Pangeo in this agency and beyond.

Keyword(s)

Pangeo, Dask, Jupyter, HPC, Cloud, Big Data, Analysis, Open Source

How to cite
Eynard-Bontemps Guillaume, Abernathey Ryan, Hamman Joseph, Ponte Aurelien, Rath Willi (2019). The Pangeo Big Data Ecosystem and its use at CNES. P. Soille, S. Loekken, and S. Albani (Eds.) Proc. of the 2019 conference on Big Data from Space (BiDS’2019), EUR 29660 EN, Publications Office of the European Union, Luxembourg, 2019, ISBN 978-92-76-00034-1 , doi:10.2760/848593, 2019. Part. Interactive Processing and Visualisation, pp.49-52. https://archimer.ifremer.fr/doc/00503/61441/

Copy this text