22
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SMILES-based deep generative scaffold decorator for de-novo drug design

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          The properties of known drugs. 1. Molecular frameworks.

          In order to better understand the common features present in drug molecules, we use shape description methods to analyze a database of commercially available drugs and prepare a list of common drug shapes. A useful way of organizing this structural data is to group the atoms of each drug molecule into ring, linker, framework, and side chain atoms. On the basis of the two-dimensional molecular structures (without regard to atom type, hybridization, and bond order), there are 1179 different frameworks among the 5120 compounds analyzed. However, the shapes of half of the drugs in the database are described by the 32 most frequently occurring frameworks. This suggests that the diversity of shapes in the set of known drugs is extremely low. In our second method of analysis, in which atom type, hybridization, and bond order are considered, more diversity is seen; there are 2506 different frameworks among the 5120 compounds in the database, and the most frequently occurring 42 frameworks account for only one-fourth of the drugs. We discuss the possible interpretations of these findings and the way they may be used to guide future drug discovery research.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Molecular de-novo design through deep reinforcement learning

            This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. Graphical abstract . Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0235-x) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              SMILES. 2. Algorithm for generation of unique SMILES notation

                Bookmark

                Author and article information

                Contributors
                josep.arus@dcb.unibe.ch
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                29 May 2020
                29 May 2020
                2020
                : 12
                : 38
                Affiliations
                [1 ]GRID grid.418151.8, ISNI 0000 0001 1519 6403, Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, , AstraZeneca, ; Gothenburg, Sweden
                [2 ]GRID grid.418151.8, ISNI 0000 0001 1519 6403, Medicinal Chemistry, Respiratory Inflammation, and Autoimmune (RIA), BioPharmaceutical R&D, , AstraZeneca, ; Gothenburg, Sweden
                [3 ]GRID grid.5734.5, ISNI 0000 0001 0726 5157, Department of Chemistry and Biochemistry, , University of Bern, ; Freiestrasse 3, 3012 Bern, Switzerland
                [4 ]Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health -Guangdong Laboratory, Guangzhou, China
                Author information
                http://orcid.org/0000-0002-9860-2944
                Article
                441
                10.1186/s13321-020-00441-8
                7260788
                33431013
                0d8e5194-9c38-4124-802d-59e189b68ff9
                © The Author(s) 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 4 February 2020
                : 16 May 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100010665, H2020 Marie Skłodowska-Curie Actions;
                Award ID: 676434
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2020

                Chemoinformatics
                deep learning,generative models,smiles,randomized smiles,recurrent neural networks,fragment-based drug discovery,data augmentation,recap,matched molecular pairs,ligand series

                Comments

                Comment on this article