1
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The rate and role of pseudogenes of the Mycobacterium tuberculosis complex

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Whole-genome sequence analyses have significantly contributed to the understanding of virulence and evolution of the Mycobacterium tuberculosis complex (MTBC), the causative pathogens of tuberculosis. Most MTBC evolutionary studies are focused on single nucleotide polymorphisms and deletions, but rare studies have evaluated gene content, whereas none has comprehensively evaluated pseudogenes. Accordingly, we describe an extensive study focused on quantifying and predicting possible functions of MTBC and Mycobacterium canettii pseudogenes. Using NCBI’s PGAP-detected pseudogenes, we analysed 25 837 pseudogenes from 158 MTBC and M. canetii strains and combined transcriptomics and proteomics of M. tuberculosis H37Rv to gain insights about pseudogenes' expression. Our results indicate significant variability concerning rate and conservancy of in silico predicted pseudogenes among different ecotypes and lineages of tuberculous mycobacteria and pseudogenization of important virulence factors and genes of the metabolism and antimicrobial resistance/tolerance. We show that in silico predicted pseudogenes contribute considerably to MTBC genetic diversity at the population level. Moreover, the transcription machinery of M. tuberculosis can fully transcribe most pseudogenes, indicating intact promoters and recent pseudogene evolutionary emergence. Proteomics of M. tuberculosis and close evaluation of mutational lesions driving pseudogenization suggest that few in silico predicted pseudogenes are likely capable of neofunctionalization, nonsense mutation reversal, or phase variation, contradicting the classical definition of pseudogenes. Such findings indicate that genome annotation should be accompanied by proteomics and protein function assays to improve its accuracy. While indels and insertion sequences are the main drivers of the observed mutational lesions in these species, population bottlenecks and genetic drift are likely the evolutionary processes acting on pseudogenes' emergence over time. Our findings unveil a new perspective on MTBC’s evolution and genetic diversity.

          Most cited references113

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

          Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

            Abstract Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein–protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein–protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              HTSeq—a Python framework to work with high-throughput sequencing data

              Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de
                Bookmark

                Author and article information

                Journal
                Microb Genom
                Microb Genom
                mgen
                mgen
                Microbial Genomics
                Microbiology Society
                2057-5858
                2022
                17 October 2022
                17 October 2022
                : 8
                : 10
                : mgen000876
                Affiliations
                [ 1] departmentLaboratory of Applied Research in Mycobacteria , Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo , São Paulo, SP, Brazil
                [ 2] departmentDepartment of Preventive Veterinary Medicine and Animal Health , College of Veterinary Medicine, University of São Paulo , São Paulo, SP, Brazil
                [ 3] departmentFunctional Proteomics Laboratory , Federal University of São Paulo (UNIFESP) , São José dos Campos, SP, Brazil
                [ 4] departmentCenter of Molecular Biology and Genetic Engineering , University of Campinas , Campinas, SP, Brazil
                [ 5] departmentInstitute of Science and Technology , Federal University of São Paulo (UNIFESP) , São José dos Campos, SP, Brazil
                [ 6] departmentLaboratory of Cellular Cycle , Butantan Institute , São Paulo, SP, Brazil
                [ 7] departmentDepartment of Comparative Pathobiology, College of Veterinary Medicine , Purdue University
                Author notes
                *Correspondence: Ana Marcia Sá Guimarães, anamarcia@ 123456usp.br
                Author information
                https://orcid.org/0000-0002-8261-5863
                Article
                000876
                10.1099/mgen.0.000876
                9676053
                36250787
                b5820eb3-7613-408f-9b82-5ec8e6f9f1f1
                © 2022 The Authors

                This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License.

                History
                : 04 February 2022
                : 13 July 2022
                Funding
                Funded by: Morris Animal Foundation
                Award ID: D17ZO-307
                Award Recipient : AnaMarcia Sa Guimaraes
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo
                Award ID: 2016/26108-0
                Award Recipient : AnaMarcia Sa Guimaraes
                Funded by: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
                Award ID: 001
                Award Recipient : NotApplicable
                Funded by: Conselho Nacional de Desenvolvimento Científico e Tecnológico
                Award ID: 88887.508739/2020-00
                Award Recipient : TaianaTainá Silva-Pereira
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo
                Award ID: 2019/03232-6
                Award Recipient : AlexandreH. Aono
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo
                Award ID: 2017/04617-3
                Award Recipient : CristinaKraemer Zimpel
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo
                Award ID: 2017/20147-7
                Award Recipient : NailaCristina Soler-Camargo
                Categories
                Research Articles
                Pathogens and Epidemiology
                Custom metadata
                0

                comparative genomics, mycobacterium tuberculosis complex,pseudogenes,loss of function mutations,frameshift,phase variation

                Comments

                Comment on this article