Pfam: The protein families database in 2021

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: found

Is Open Access

UniProt: a worldwide hub of protein knowledge

(2018)

Abstract The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.

0 comments Cited 2428 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The Pfam protein families database in 2019

Sara El-Gebali, Jaina Mistry, Alex Bateman … (2018)

Abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

0 comments Cited 1515 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Protein homology detection by HMM-HMM comparison.

Johannes Söding (2005)

Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

0 comments Cited 961 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jaina Mistry:

ORCID: http://orcid.org/0000-0003-2479-5322

Sara Chuguransky:

ORCID: http://orcid.org/0000-0002-0520-0736

Lowri Williams:

ORCID: http://orcid.org/0000-0001-5551-8526

Matloob Qureshi:

ORCID: http://orcid.org/0000-0003-2208-4236

Gustavo A Salazar

Erik L L Sonnhammer

Silvio C E Tosatto:

ORCID: http://orcid.org/0000-0003-4525-7793

Lisanna Paladin:

ORCID: http://orcid.org/0000-0003-0011-9397

Shriya Raj:

ORCID: http://orcid.org/0000-0002-1973-6347

Lorna J Richardson:

ORCID: http://orcid.org/0000-0002-3655-5660

Robert D Finn:

ORCID: http://orcid.org/0000-0002-6982-4660

Alex Bateman:

ORCID: http://orcid.org/0000-0001-8626-2148

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date Collection: 08 January 2021

Publication date (Electronic): 30 October 2020

Publication date PMC-release: 30 October 2020

Volume: 49

Issue: D1

Pages: D412-D419

Affiliations

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK

Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University , Box 1031, 17121 Solna, Sweden

Department of Biomedical Sciences, University of Padua , 35131 Padova, Italy

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK

Author notes

To whom correspondence should be addressed. Tel: +44 1223 494100; Fax: +44 1223 494468; Email: jaina@ 123456ebi.ac.uk

Author information

Jaina Mistry http://orcid.org/0000-0003-2479-5322

Sara Chuguransky http://orcid.org/0000-0002-0520-0736

Lowri Williams http://orcid.org/0000-0001-5551-8526

Matloob Qureshi http://orcid.org/0000-0003-2208-4236

Silvio C E Tosatto http://orcid.org/0000-0003-4525-7793

Lisanna Paladin http://orcid.org/0000-0003-0011-9397

Shriya Raj http://orcid.org/0000-0002-1973-6347

Lorna J Richardson http://orcid.org/0000-0002-3655-5660

Robert D Finn http://orcid.org/0000-0002-6982-4660

Alex Bateman http://orcid.org/0000-0001-8626-2148

Article

Publisher ID: gkaa913

DOI: 10.1093/nar/gkaa913

PMC ID: 7779014

PubMed ID: 33125078

SO-VID: db8cfe38-db7e-4f1b-855c-33112323e1b6

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 06 October 2020

Date revision received : 01 October 2020

Date received : 11 September 2020

Page count

Pages: 8

Funding

Funded by: European Union's Horizon 2020 MSCA-RISE action;

Award ID: 823886

Funded by: Wellcome, DOI 10.13039/100010269;

Award ID: 108433/Z/15/Z

Funded by: BBSRC, DOI 10.13039/501100000268;

Award ID: BB/S020381/1

Funded by: Open Targets;

Funded by: European Molecular Biology Laboratory Core Funds;

Comments

Comment on this article

scite_

Cited by 1,552

See all cited by

Most referenced authors 926

See all reference authors

- Version 1

Pfam: The protein families database in 2021

Read this article at

Abstract

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 26

UniProt: a worldwide hub of protein knowledge

The Pfam protein families database in 2019

Protein homology detection by HMM-HMM comparison.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 373

Cited by 1,552

Most referenced authors 926