Zhi-Sen Wei, Jun Rao, Yao-Jin Lin

A Deep Model for Species-Specific Prediction of Ribonucleic-Acid-Binding Protein with Short Motifs

  • Fluid Flow and Transfer Processes
  • Computer Science Applications
  • Process Chemistry and Technology
  • General Engineering
  • Instrumentation
  • General Materials Science

RNA-binding proteins (RBPs) play an important role in the synthesis and degradation of ribonucleic acid (RNA) molecules. The rapid and accurate identification of RBPs is essential for understanding the mechanisms of cell activity. Since identifying RBPs experimentally is expensive and time-consuming, computational methods have been explored to predict RBPs directly from protein sequences. In this paper, we developed an RBP prediction method named CnnRBP based on a convolution neural network. CnnRBP derived a sparse high-dimensional di- and tripeptide frequency feature vector from a protein sequence and then reduced this vector to a low-dimensional one using the Light Gradient Boosting Machine (LightGBM) algorithm. Then, the low-dimensional vectors derived from both RNA-binding proteins and non-RNA-binding proteins were fed to a multi-layer one-dimensional convolution network. Meanwhile, the SMOTE algorithm was used to alleviate the class imbalance in the training data. Extensive experiments showed that the proposed method can extract discriminative features to identify RBPs effectively. With 10-fold cross-validation on the training datasets, CnnRBP achieved AUC values of 99.98%, 99.69% and 96.72% for humans, E. coli and Salmonella, respectively. On the three independent datasets, CnnRBP achieved AUC values of 0.91, 0.96 and 0.91, outperforming the recent tripeptide-based method (i.e., TriPepSVM) by 8%, 4% and 5%, respectively. Compared with the state-of-the-art CNN-based predictor (i.e., iDRBP_MMC), CnnRBP achieved MCC values of 0.67, 0.68 and 0.73 with significant improvements by 6%, 6% and 15%, respectively. In addition, the cross-species testing shows that CnnRBP has a robust generalization performance for cross-species RBP prediction between close species.

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

  • Web-based, modern reference management
  • Collaborate and share with fellow researchers
  • Integration with Overleaf
  • Comprehensive BibTeX/BibLaTeX support
  • Save articles and websites directly from your browser
  • Search for new articles from a database of tens of millions of references
Try out CiteDrive

More from our Archive