NCBI Reference Sequences: current status, policy and new initiatives

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

NCBI's Reference Sequence (RefSeq) database ( http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 10 ⁶ proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.

Related collections

Most cited references 9

Record: found
Abstract: found
Article: not found

Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics.

Lynne Maquat (2004)

Studies of nonsense-mediated mRNA decay in mammalian cells have proffered unforeseen insights into changes in mRNA-protein interactions throughout the lifetime of an mRNA. Remarkably, mRNA acquires a complex of proteins at each exon-exon junction during pre-mRNA splicing that influences the subsequent steps of mRNA translation and nonsense-mediated mRNA decay. Complex-loaded mRNA is thought to undergo a pioneer round of translation when still bound by cap-binding proteins CBP80 and CBP20 and poly(A)-binding protein 2. The acquisition and loss of mRNA-associated proteins accompanies the transition from the pioneer round to subsequent rounds of translation, and from translational competence to substrate for nonsense-mediated mRNA decay.

0 comments Cited 278 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Entrez Gene: gene-centered information at NCBI

Donna R. Maglott, Jim Ostell, Kim D. Pruitt … (2006)

Entrez Gene () is NCBI's database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programing utilities (E-Utilities), and for bulk transfer by ftp.

0 comments Cited 260 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Splign: algorithms for computing spliced alignments with identification of paralogs

Yuri Kapustin, Alexander Souvorov, Tatiana Tatusova … (2008)

Background The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. Results We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. Conclusion Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. Reviewers This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand).

0 comments Cited 153 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date Collection: January 2009

Publication date (Print): January 2009

Publication date (Electronic): 16 October 2008

Publication date PMC-release: 16 October 2008

Volume: 37

Issue: Database issue , Database issue

Pages: D32-D36

Affiliations

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rm 4As.47B, 45 Center Drive, Bethesda, MD, USA

Author notes

*To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 480 2918; Email: pruitt@ 123456ncbi.nlm.nih.gov

Article

Publisher ID: gkn721

DOI: 10.1093/nar/gkn721

PMC ID: 2686572

PubMed ID: 18927115

SO-VID: 45dbc0f5-47ba-4cf3-9be3-de9face9cb3d

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 15 September 2008

Date revision received : 26 September 2008

Date accepted : 30 September 2008

Comments

Comment on this article

scite_

Cited by 305

See all cited by

Most referenced authors 1,268

See all reference authors

NCBI Reference Sequences: current status, policy and new initiatives

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 9

Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics.

Entrez Gene: gene-centered information at NCBI

Splign: algorithms for computing spliced alignments with identification of paralogs

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 47

Cited by 305

Most referenced authors 1,268