19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning

      1 , 1 , 1 , 2
      Bioinformatics
      Oxford University Press (OUP)

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the ‘semantic space’ to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization.

          Results

          In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to ‘semantic space’ to reflect the structure patterns with the help of predicted residue–residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods.

          Availability and implementation

          For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Author and article information

          Journal
          Bioinformatics
          Oxford University Press (OUP)
          1367-4803
          1460-2059
          November 01 2020
          January 29 2021
          July 23 2020
          November 01 2020
          January 29 2021
          July 23 2020
          : 36
          : 21
          : 5177-5186
          Affiliations
          [1 ]School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
          [2 ]Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
          Article
          10.1093/bioinformatics/btaa667
          32702119
          33dbba9d-63a7-4922-b6af-ed359dfc2d00
          © 2020

          https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

          History

          Comments

          Comment on this article