162
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      NCBI Reference Sequences: current status, policy and new initiatives

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          NCBI's Reference Sequence (RefSeq) database ( http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 10 6 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: not found

          Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics.

          Studies of nonsense-mediated mRNA decay in mammalian cells have proffered unforeseen insights into changes in mRNA-protein interactions throughout the lifetime of an mRNA. Remarkably, mRNA acquires a complex of proteins at each exon-exon junction during pre-mRNA splicing that influences the subsequent steps of mRNA translation and nonsense-mediated mRNA decay. Complex-loaded mRNA is thought to undergo a pioneer round of translation when still bound by cap-binding proteins CBP80 and CBP20 and poly(A)-binding protein 2. The acquisition and loss of mRNA-associated proteins accompanies the transition from the pioneer round to subsequent rounds of translation, and from translational competence to substrate for nonsense-mediated mRNA decay.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Entrez Gene: gene-centered information at NCBI

            Entrez Gene () is NCBI's database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programing utilities (E-Utilities), and for bulk transfer by ftp.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Splign: algorithms for computing spliced alignments with identification of paralogs

              Background The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. Results We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. Conclusion Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. Reviewers This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand).
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                January 2009
                January 2009
                16 October 2008
                16 October 2008
                : 37
                : Database issue , Database issue
                : D32-D36
                Affiliations
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rm 4As.47B, 45 Center Drive, Bethesda, MD, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 480 2918; Email: pruitt@ 123456ncbi.nlm.nih.gov
                Article
                gkn721
                10.1093/nar/gkn721
                2686572
                18927115
                45dbc0f5-47ba-4cf3-9be3-de9face9cb3d
                © 2008 The Author(s)

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 15 September 2008
                : 26 September 2008
                : 30 September 2008
                Categories
                Articles

                Genetics
                Genetics

                Comments

                Comment on this article