The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.
Viruses are infectious particles that can only multiply inside the cells of microbes and other organisms. Little is known about the genetic differences between virus particles (so-called ‘genetic diversity’), especially compared to what we know about the diversity of bacteria, archaea, and other single-celled microbes. This lack of knowledge hampers our understanding of the role viruses play in the evolution of microbial communities and their associated ecosystems.
Studying the genetics of the viruses in these communities is challenging. There is no single ‘marker’ gene that can be used to identify all viruses in environmental samples. Also, many of the fragments of viral genomes that have been identified have not yet been linked to their host microbes. Many viruses integrate their genome into the DNA of their host cell, and there are computational tools available that exploit this ability to identify viruses and link them to their host. However, other viruses can live and multiply inside cells without integrating their genome into the host's DNA.
Earlier in 2015, researchers developed a new computational tool called VirSorter that can predict virus genome sequences within the DNA extracted from microbes. VirSorter identifies viral genome sequences based on the presence of ‘hallmark’ genes that encode for components found in many virus particles, together with a reference database of genomes from many viruses.
Now, Roux et al.—including some of the researchers from the earlier work—use VirSorter to predict viral DNA from publicly available bacteria and archaea genome data. The study identifies over 12,000 viral genomes and links them to their microbial hosts. These data increase the number of viral genome sequences that are publically available by a factor of ten and identify the first viruses associated with 13 new types of bacteria, which include species that are abundant in particular environments.
It is possible for several different viruses to infect a single cell at the same time. Some viruses are known to be able to exchange DNA, and if this happens frequently in other viruses, it could have a big impact on how viruses evolve. Roux et al.'s findings suggest that although it is common for several different viruses to infect the same cell, it is relatively rare for these viruses to exchange genetic material.
Roux et al.'s findings demonstrate the value of searching publicly available microbial genome data for fragments of viral genomes. These new viral genomes will serve as a useful resource for researchers as they explore the communities of viruses and microbes in natural environments, the human body and in industrial processes.