Copy this text
KiNext: a portable and scalable workflow for the identification and classification of protein kinases
Background
Protein kinases are a diverse superfamily of proteins common to organisms across the tree of life that are typically involved in signal transduction, allowing organisms to sense and respond to biotic or abiotic environmental factors. They have important roles in organismal physiology, including development, reproduction, acclimation to environmental stress, while their dysregulation can lead to disease, including several forms of cancer. Identifying the complement of protein kinases (the kinome) of any organism is useful for understanding its physiological capabilities, limitations and adaptations to environmental stress. The increasing availability of genomes makes it now possible to examine and compare the kinomes across a broad diversity of organisms. Here we present a pipeline respecting the FAIR principles (findable, accessible, interoperable and reusable) that facilitates the search and identification of protein kinases from a predicted proteome, and classifies them according to group of serine/threonine/tyrosine protein kinases present in eukaryotes.
Results
KiNext is a Nextflow pipeline that regroups a number of existing bioinformatic tools to search for and classify the protein kinases of an organism in a reproducible manner, starting from a set of amino acid sequences. Conventional eukaryotic protein kinases (ePKs) and atypical protein kinases (aPKs) are identified by using Hidden Markov Models (HMMs) generated from the catalytic domains of kinases. Furthermore, KiNext categorizes ePKs into the eight kinase groups by employing dedicated Hidden Markov Models (HMMs) tailored for each group. The performance of the KiNext pipeline was validated against previously identified kinomes obtained with other tools that were already published for two marine species, the Pacific oyster Crassostrea gigas and the unicellular green alga Ostreoccocus tauri. KiNext outperformed previous results by finding previously unidentified kinases and by attributing a large proportion of previously unclassified kinases to a group in both species. These results demonstrate improvements in kinase identification and classification, all while providing traceability and reproducibility of results in a FAIR pipeline. The default HMM models provided with KiNext are most suitable for eukaryotes, but the pipeline can be easily modified to include HMM models for other taxa of interest.
Conclusion
The KiNext pipeline enables efficient and reproducible identification of kinomes based on predicted amino acid sequences (i.e. proteomes). KiNext was designed to be easy to use, automated, portable and scalable.
Keyword(s)
Kinase, Genome annotation, Workflow