getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories

Type Article
Date 2022-07
Language English
Author(s) Moco Vincent1, Cazenave Damien1, Garnier Maëlle1, Pot Matthieu1, Marcelino Isabel1, Talarmin Antoine1, Guyomard-Rabenirina Stéphanie1, Breurec Sébastien1, 2, 3, Ferdinand Séverine1, Dereeper Alexis1, Reynaud YannORCID1, Couvin David1
Affiliation(s) 1 : Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France
2 : Faculté de Médecine Hyacinthe Bastaraud, Université des Antilles, Pointe-à-Pitre, France
3 : Centre d’Investigation Clinique Antilles Guyane, Inserm CIC 1424, Pointe-à-Pitre, France
Source Bmc Bioinformatics (1471-2105) (Springer Science and Business Media LLC), 2022-07 , Vol. 23 , N. 1 , P. 268 (11p.)
DOI 10.1186/s12859-022-04809-5
WOS© Times Cited 1
Keyword(s) Genome sequences, Nucleotide diversity, Assembly, DNA, Repository, Metadata
Abstract

Background Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. Results The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a “NucleScore” for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis. Conclusion The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo. getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform (http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html).

Full Text
File Pages Size Access
Publisher's official version 11 1 MB Open access
Result dataset obtained using “SRArunInfo.pl” for 3 accessions (SRR7693877, SRR9850824, and SRR9850830). 23 KB Open access
Result datasets corresponding to 2,518 complete genome assemblies belonging to E. coli, K. pneumoniae, and E. cloacae from RefSeq or GenBank repositories. The first ... 3 MB Open access
Top of the page

How to cite 

Moco Vincent, Cazenave Damien, Garnier Maëlle, Pot Matthieu, Marcelino Isabel, Talarmin Antoine, Guyomard-Rabenirina Stéphanie, Breurec Sébastien, Ferdinand Séverine, Dereeper Alexis, Reynaud Yann, Couvin David (2022). getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories. Bmc Bioinformatics, 23(1), 268 (11p.). Publisher's official version : https://doi.org/10.1186/s12859-022-04809-5 , Open Access version : https://archimer.ifremer.fr/doc/00788/89999/