4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      How can natural language processing help model informed drug development?: a review

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective

          To summarize applications of natural language processing (NLP) in model informed drug development (MIDD) and identify potential areas of improvement.

          Materials and Methods

          Publications found on PubMed and Google Scholar, websites and GitHub repositories for NLP libraries and models. Publications describing applications of NLP in MIDD were reviewed. The applications were stratified into 3 stages: drug discovery, clinical trials, and pharmacovigilance. Key NLP functionalities used for these applications were assessed. Programming libraries and open-source resources for the implementation of NLP functionalities in MIDD were identified.

          Results

          NLP has been utilized to aid various processes in drug development lifecycle such as gene-disease mapping, biomarker discovery, patient-trial matching, adverse drug events detection, etc. These applications commonly use NLP functionalities of named entity recognition, word embeddings, entity resolution, assertion status detection, relation extraction, and topic modeling. The current state-of-the-art for implementing these functionalities in MIDD applications are transformer models that utilize transfer learning for enhanced performance. Various libraries in python, R, and Java like huggingface, sparkNLP, and KoRpus as well as open-source platforms such as DisGeNet, DeepEnroll, and Transmol have enabled convenient implementation of NLP models to MIDD applications.

          Discussion

          Challenges such as reproducibility, explainability, fairness, limited data, limited language-support, and security need to be overcome to ensure wider adoption of NLP in MIDD landscape. There are opportunities to improve the performance of existing models and expand the use of NLP in newer areas of MIDD.

          Conclusions

          This review provides an overview of the potential and pitfalls of current NLP approaches in MIDD.

          Related collections

          Most cited references79

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          DrugBank 5.0: a major update to the DrugBank database for 2018

          Abstract DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in 2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year’s update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BioBERT: a pre-trained biomedical language representation model for biomedical text mining

            Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              ZINC 15 – Ligand Discovery for Everyone

              Many questions about the biological activity and availability of small molecules remain inaccessible to investigators who could most benefit from their answers. To narrow the gap between chemoinformatics and biology, we have developed a suite of ligand annotation, purchasability, target, and biology association tools, incorporated into ZINC and meant for investigators who are not computer specialists. The new version contains over 120 million purchasable “drug-like” compounds – effectively all organic molecules that are for sale – a quarter of which are available for immediate delivery. ZINC connects purchasable compounds to high-value ones such as metabolites, drugs, natural products, and annotated compounds from the literature. Compounds may be accessed by the genes for which they are annotated as well as the major and minor target classes to which those genes belong. It offers new analysis tools that are easy for nonspecialists yet with few limitations for experts. ZINC retains its original 3D roots – all molecules are available in biologically relevant, ready-to-dock formats. ZINC is freely available at http://zinc15.docking.org.
                Bookmark

                Author and article information

                Contributors
                Journal
                JAMIA Open
                JAMIA Open
                jamiaoa
                JAMIA Open
                Oxford University Press
                2574-2531
                July 2022
                11 June 2022
                11 June 2022
                : 5
                : 2
                : ooac043
                Affiliations
                Data Science, Data Collaboration Center, Critical Path Institute , Tucson, Arizona, USA
                Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
                Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
                Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
                Author notes
                Corresponding Author: Jagdeep T. Podichetty, PhD, 1730 E River Rd #200, Tucson, AZ 85718, USA; jpodichetty@ 123456c-path.org
                Article
                ooac043
                10.1093/jamiaopen/ooac043
                9188322
                35702625
                d67587a8-71f0-47a2-972a-f244eee4fece
                © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 23 March 2022
                : 28 April 2022
                : 08 May 2022
                : 26 May 2022
                Page count
                Pages: 14
                Funding
                Funded by: U.S. Department of Health and Human Services, DOI 10.13039/100000016;
                Categories
                Review
                AcademicSubjects/SCI01530
                AcademicSubjects/MED00010
                AcademicSubjects/SCI01060

                nlp,machine learning,deep learning,drug development
                nlp, machine learning, deep learning, drug development

                Comments

                Comment on this article