17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Extraction of associations of singular nucleotide polymorphism (SNP) and phenotypes from biomedical literature is a vital task in BioNLP. Recently, some methods have been developed to extract mutation-diseases affiliations. However, no accessible method of extracting associations of SNP-phenotype from content considers their degree of certainty. In this paper, several machine learning methods were developed to extract ranked SNP-phenotype associations from biomedical abstracts and then were compared to each other. In addition, shallow machine learning methods, including random forest, logistic regression, and decision tree and two kernel-based methods like subtree and local context, a rule-based and a deep CNN-LSTM-based and two BERT-based methods were developed in this study to extract associations. Furthermore, the experiments indicated that although the used linguist features could be employed to implement a superior association extraction method outperforming the kernel-based counterparts, the used deep learning and BERT-based methods exhibited the best performance. However, the used PubMedBERT-LSTM outperformed the other developed methods among the used methods. Moreover, similar experiments were conducted to estimate the degree of certainty of the extracted association, which can be used to assess the strength of the reported association. The experiments revealed that our proposed PubMedBERT–CNN-LSTM method outperformed the sophisticated methods on the task.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

          Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this article, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly available datasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modeling choices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary with BERT models, such as using complex tagging schemes in named entity recognition. To help accelerate research in biomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created a leaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) at https://aka.ms/BLURB .
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The role of phenotypic plasticity in driving genetic evolution.

            Models of population divergence and speciation are often based on the assumption that differences between populations are due to genetic factors, and that phenotypic change is due to natural selection. It is equally plausible that some of the differences among populations are due to phenotypic plasticity. We use the metaphor of the adaptive landscape to review the role of phenotypic plasticity in driving genetic evolution. Moderate levels of phenotypic plasticity are optimal in permitting population survival in a new environment and in bringing populations into the realm of attraction of an adaptive peak. High levels of plasticity may increase the probability of population persistence but reduce the likelihood of genetic change, because the plastic response itself places the population close to a peak. Moderate levels of plasticity arise whenever multiple traits, some of which are plastic and others not, form a composite trait involved in the adaptive response. For example, altered behaviours may drive selection on morphology and physiology. Because there is likely to be a considerable element of chance in which behaviours become established, behavioural change followed by morphological and physiological evolution may be a potent force in driving evolution in novel directions. We assess the role of phenotypic plasticity in stimulating evolution by considering two examples from birds: (i) the evolution of red and yellow plumage coloration due to carotenoid consumption; and (ii) the evolution of foraging behaviours on islands. Phenotypic plasticity is widespread in nature and may speed up, slow down, or have little effect on evolutionary change. Moderate levels of plasticity may often facilitate genetic evolution but careful analyses of individual cases are needed to ascertain whether plasticity has been essential or merely incidental to population differentiation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Natural selection and molecular evolution in PTC, a bitter-taste receptor gene.

              The ability to taste phenylthiocarbamide (PTC) is a classic phenotype that has long been known to vary in human populations. This phenotype is of genetic, epidemiologic, and evolutionary interest because the ability to taste PTC is correlated with the ability to taste other bitter substances, many of which are toxic. Thus, variation in PTC perception may reflect variation in dietary preferences throughout human history and could correlate with susceptibility to diet-related diseases in modern populations. To test R. A. Fisher's long-standing hypothesis that variability in PTC perception has been maintained by balancing natural selection, we examined patterns of DNA sequence variation in the recently identified PTC gene, which accounts for up to 85% of phenotypic variance in the trait. We analyzed the entire coding region of PTC (1,002 bp) in a sample of 330 chromosomes collected from African (n=62), Asian (n=138), European (n=110), and North American (n=20) populations by use of new statistical tests for natural selection that take into account the potentially confounding effects of human population growth. Two intermediate-frequency haplotypes corresponding to "taster" and "nontaster" phenotypes were found. These haplotypes had similar frequencies across Africa, Asia, and Europe. Genetic differentiation between the continental population samples was low (FST=0.056) in comparison with estimates based on other genes. In addition, Tajima's D and Fu and Li's D and F statistics demonstrated a significant deviation from neutrality because of an excess of intermediate-frequency variants when human population growth was taken into account (P<.01). These results combine to suggest that balancing natural selection has acted to maintain "taster" and "nontaster" alleles at the PTC locus in humans.
                Bookmark

                Author and article information

                Contributors
                bokharaeian@gmail.com
                dehghani.mohammad@ut.ac.ir
                albertodiaz@fdi.ucm.es
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                12 April 2023
                12 April 2023
                2023
                : 24
                : 144
                Affiliations
                [1 ]GRID grid.495554.c, Amol University of Special Modern Technologies, ; Mazandaran, Iran
                [2 ]GRID grid.46072.37, ISNI 0000 0004 0612 7950, School of Electrical and Computer Engineering, , University of Tehran, ; Tehran, Iran
                [3 ]GRID grid.4795.f, ISNI 0000 0001 2157 7667, Facultad Informatica, , Complutense University of Madrid, ; Madrid, Spain
                Article
                5236
                10.1186/s12859-023-05236-w
                10099837
                37046202
                5a00b10e-dea0-4b40-8dcd-caa5e990c1ac
                © The Author(s) 2023

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 1 June 2022
                : 17 March 2023
                Categories
                Research
                Custom metadata
                © The Author(s) 2023

                Bioinformatics & Computational biology
                snp,phenotype,biomedical relation extraction,degree of certainty classification

                Comments

                Comment on this article