4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Large language models can store extensive world knowledge, often extractable through question-answering (e.g., "What is Abraham Lincoln's birthday?"). However, it's unclear whether the model answers questions based on exposure to exact/similar questions during training, or if it genuinely extracts knowledge from the source (e.g., Wikipedia biographies). In this paper, we conduct an in-depth study of this problem using a controlled set of semi-synthetic biography data. We uncover a relationship between the model's knowledge extraction ability and different diversity measures of the training data. We conduct (nearly) linear probing, revealing a strong correlation between this relationship and whether the model (nearly) linearly encodes the knowledge attributes at the hidden embedding of the entity names, or across the embeddings of other tokens in the training text.

          Related collections

          Author and article information

          Journal
          25 September 2023
          Article
          2309.14316
          2ab41171-df0a-4997-8194-9b80f8b3feec

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          cs.CL cs.AI cs.LG

          Theoretical computer science,Artificial intelligence
          Theoretical computer science, Artificial intelligence

          Comments

          Comment on this article