Copy this text
Ocean Spy's annotation data
For the purpose of assessing and monitoring conservation state of marine ecosystems, scientists are deploying seabed observatories and using mobile devices to acquire temporal and spatialized biodiversity data using imagery, to complement traditional 'stationary' sampling approaches. These monitoring programs make it possible to gather a large amount of data, particularly underwater images, which represents huge volume of datasets that are difficult to process. Artificial intelligence (AI) has enabled the development of algorithms to facilitate the processing of large datasets. However, the ability of machines to detect and classify objects automatically for scientific purposes depends on a learning phase based on large reference databases that have been built manually by human brains, a highly time-consuming task. In this context, involving citizens in collecting training data is an interesting solution, thanks to the important observation power that crowd represents (Matabos et al., 2024, accepted with revision).
Launched in 2023, the Ocean Spy project aims at engaging citizen in the process of marine images annotation. It provides a web platform allowing general public to access images collected in various marine habitats, from shallow to deep waters. Different tools and functionalities were designed to guide users for locating and identifying animals (or other subjects of interest) in the images, through the annotation interface.
Beyond the development of the online portal and the associated database, such an initiative requires methods for pre-processing and validating the data generated by citizen annotation. Particularly, to identify common organisms (resulting from multiple annotations) and to clean the database according to a threshold of agreement between participants. These are essential steps in the production of reliable, high-quality scientific knowledge, and in boosting the performance of AI methods for identifying taxa.
General statistics on this first year of operation and the data analysis pipeline being fine-tuned will be presented at the workshop.