25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Bacterial genomes vary extensively in terms of both gene content and gene sequence. This plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogens Streptococcus pneumoniae and Streptococcus pyogenes, SEER identifies relevant previously characterized resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness of S. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.

          Abstract

          Plasticity and clonal population structure in bacterial genomes can hinder traditional SNP-based genetic association studies. Here, Corander and colleagues present a method to identify variable-length sequence elements enriched in a phenotype of interest, and demonstrate its use in human pathogens.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

          (2013)
          Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net. Contact: hengli@broadinstitute.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Disease manifestations and pathogenic mechanisms of group a Streptococcus.

            Streptococcus pyogenes, also known as group A Streptococcus (GAS), causes mild human infections such as pharyngitis and impetigo and serious infections such as necrotizing fasciitis and streptococcal toxic shock syndrome. Furthermore, repeated GAS infections may trigger autoimmune diseases, including acute poststreptococcal glomerulonephritis, acute rheumatic fever, and rheumatic heart disease. Combined, these diseases account for over half a million deaths per year globally. Genomic and molecular analyses have now characterized a large number of GAS virulence determinants, many of which exhibit overlap and redundancy in the processes of adhesion and colonization, innate immune resistance, and the capacity to facilitate tissue barrier degradation and spread within the human host. This improved understanding of the contribution of individual virulence determinants to the disease process has led to the formulation of models of GAS disease progression, which may lead to better treatment and intervention strategies. While GAS remains sensitive to all penicillins and cephalosporins, rising resistance to other antibiotics used in disease treatment is an increasing worldwide concern. Several GAS vaccine formulations that elicit protective immunity in animal models have shown promise in nonhuman primate and early-stage human trials. The development of a safe and efficacious commercial human vaccine for the prophylaxis of GAS disease remains a high priority.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Hierarchical and Spatially Explicit Clustering of DNA Sequences with BAPS Software

              Phylogeographical analyses have become commonplace for a myriad of organisms with the advent of cheap DNA sequencing technologies. Bayesian model-based clustering is a powerful tool for detecting important patterns in such data and can be used to decipher even quite subtle signals of systematic differences in molecular variation. Here, we introduce two upgrades to the Bayesian Analysis of Population Structure (BAPS) software, which enable 1) spatially explicit modeling of variation in DNA sequences and 2) hierarchical clustering of DNA sequence data to reveal nested genetic population structures. We provide a direct interface to map the results from spatial clustering with Google Maps using the portal http://www.spatialepidemiology.net/ and illustrate this approach using sequence data from Borrelia burgdorferi. The usefulness of hierarchical clustering is demonstrated through an analysis of the metapopulation structure within a bacterial population experiencing a high level of local horizontal gene transfer. The tools that are introduced are freely available at http://www.helsinki.fi/bsg/software/BAPS/.
                Bookmark

                Author and article information

                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group
                2041-1723
                16 September 2016
                2016
                : 7
                : 12797
                Affiliations
                [1 ]Pathogen Genomics, Wellcome Trust Sanger Institute , Cambridge CB10 1SA, UK
                [2 ]Department of Mathematics and Statistics, University of Helsinki , Helsinki FI-00014, Finland
                [3 ]Department of Medical and Clinical Genetics, Genome-Scale Biology Research Program, University of Helsinki , Helsinki FI-00014, Finland
                [4 ]Department of Medicine, University of Cambridge , Cambridge CB2 0SP, UK
                [5 ]Department of Infectious Disease Epidemiology, Imperial College , London W2 1NY, UK
                [6 ]Department of Computer Science, Aalto University , Espoo FI-00076, Finland
                [7 ]Helsinki Institute of Information Technology HIIT, Department of Computer Science, Aalto University , Espoo FI-00076, Finland
                [8 ]Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne , Melbourne, Victoria 3010, Australia
                [9 ]Centre for International Child Health, Department of Paediatrics, University of Melbourne , Melbourne, Victoria 3052, Australia
                [10 ]Group A Streptococcal Research Group, Murdoch Children's Research Institute , Parkville, Victoria 3052, Australia
                [11 ]Menzies School of Health Research , Darwin, Northern Territory 0811, Australia
                [12 ]Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki , Helsinki FI-00014, Finland
                [13 ]Department of Biostatistics, University of Oslo , 0317 Oslo, Norway
                Author notes
                [*]

                These authors contributed equally to this work.

                Author information
                http://orcid.org/0000-0001-5360-1254
                http://orcid.org/0000-0001-9193-8093
                http://orcid.org/0000-0002-7069-5958
                Article
                ncomms12797
                10.1038/ncomms12797
                5028413
                27633831
                dc9d61ff-7497-4391-8205-7a6b3a0e4fbc
                Copyright © 2016, The Author(s)

                This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                History
                : 05 January 2016
                : 28 July 2016
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article