0
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      GenBank

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          GenBank ® ( https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 15.3 trillion base pairs from over 2.5 billion nucleotide sequences for 504 000 formally described species. Recent updates include resources for data from the SARS-CoV-2 virus, including a SARS-CoV-2 landing page, NCBI Datasets, NCBI Virus and the Submission Portal. We also discuss upcoming changes to GI identifiers, a new data management interface for BioProject, and advice for providing contextual metadata in submissions.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

          The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            BLAST: a more efficient report with usability improvements

            The Basic Local Alignment Search Tool (BLAST) website at the National Center for Biotechnology (NCBI) is an important resource for searching and aligning sequences. A new BLAST report allows faster loading of alignments, adds navigation aids, allows easy downloading of subject sequences and reports and has improved usability. Here, we describe these improvements to the BLAST report, discuss design decisions, describe other improvements to the search page and database documentation and outline plans for future development. The NCBI BLAST URL is http://blast.ncbi.nlm.nih.gov.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Darwin Core: An Evolving Community-Developed Biodiversity Data Standard

              Biodiversity data derive from myriad sources stored in various formats on many distinct hardware and software platforms. An essential step towards understanding global patterns of biodiversity is to provide a standardized view of these heterogeneous data sources to improve interoperability. Fundamental to this advance are definitions of common terms. This paper describes the evolution and development of Darwin Core, a data standard for publishing and integrating biodiversity information. We focus on the categories of terms that define the standard, differences between simple and relational Darwin Core, how the standard has been implemented, and the community processes that are essential for maintenance and growth of the standard. We present case-study extensions of the Darwin Core into new research communities, including metagenomics and genetic resources. We close by showing how Darwin Core records are integrated to create new knowledge products documenting species distributions and changes due to environmental perturbations.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                07 January 2022
                01 December 2021
                01 December 2021
                : 50
                : D1
                : D161-D164
                Affiliations
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike , Bethesda, MD 20894, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 301 496 2475; Fax: +1 301 480 9241; Email: sayers@ 123456ncbi.nlm.nih.gov
                Author information
                https://orcid.org/0000-0001-8394-3802
                https://orcid.org/0000-0003-1839-5322
                https://orcid.org/0000-0002-0289-7101
                Article
                gkab1135
                10.1093/nar/gkab1135
                8690257
                34850943
                21fbbb07-804a-4c02-a2f7-4a9b63c2472a
                Published by Oxford University Press on behalf of Nucleic Acids Research 2021.

                This work is written by (a) US Government employee(s) and is in the public domain in the US.

                History
                : 08 November 2021
                : 20 October 2021
                : 18 September 2021
                Page count
                Pages: 4
                Funding
                Funded by: Intramural Research Program of the National Library of Medicine;
                Funded by: National Institutes of Health, DOI 10.13039/100000002;
                Categories
                AcademicSubjects/SCI00010
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article