"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 01, 2024

LLM Discussions - Good Read


Limits of Transformers on Compositionality

  • First, transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized path matching. 
  • This contrasts with the systematic multi-step reasoning approach that learns to apply underlying computational rules required for building correct answers [71, 37, 27]. 
  • Shortcut learning [29] via pattern-matching may yield fast correct answers when similar compositional patterns are available during training but does not allow for robust generalization to uncommon or complex examples. 
  • Second, due to error propagation, transformers may have inherent limitations on solving high-complexity compositional tasks that exhibit novel patterns. Errors in the early stages of the computational process can lead to substantial compounding errors in subsequent steps, preventing models from finding correct solutions.

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

  • Humans and large language models (LLMs) have some shared properties and some properties that differ. If LLMs are analyzed using tests designed for humans, we risk identifying only the shared properties, missing the properties that are unique to LLMs (the dotted region of the diagram). We argue that to identify the properties in the dotted region we must approach
  • LLMs on their own terms by considering the problem that they were trained to solve: next-word prediction over Internet text.

On the Measure of Intelligence

  • Describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience, as critical pieces to be accounted for in characterizing intelligent systems.
  • Intelligence as a collection of task-specific skills
  • Intelligence as a general learning ability
  • Skill-based, narrow AI evaluation
  • The spectrum of generalization: robustness, flexibility, generality
  • System-centric generalization: this is the ability of a learning system to handle situations it has not itself encountered before. 
  • Developer-aware generalization: this is the ability of a system, either learning or static, to handle situations that neither the system nor the developer of the system have encountered before.
  • Local generalization, or “robustness”: This is the ability of a system to handle new points from a known distribution for a single task or a well-scoped set of known tasks
  • Broad generalization, or “flexibility”: This is the ability of a system to handle a broad category of tasks and environments without further human intervention

Keep Exploring!!! 

No comments: