PAIR Interpretability
January 24, 2025 ยท View on GitHub
This repo contains code and articles on PAIR interpretability projects.
Scalable Influence and Fact Tracing for Large Language Model Pretraining (ICLR'25)
See blog post, for a light introduction to the paper. There is also a public demo, and the dedicated github repo. The full paper is Scalable Influence and Fact Tracing for Large Language Model Pretraining -- Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, Ian Tenney (RH)
Racing Thoughts: Explaining Large Language Model Contextualization Errors (NAACL'25)
Racing Thoughts: Explaining Contextualization Errors Within Large Language Models -- Michael A. Lepori, Mike Mozer, Asma Ghandeharioun (RH)
Who's asking? User personas and the mechanics of latent misalignment (NeurIPS'24)
Who's asking? User personas and the mechanics of latent misalignment -- Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon, at NeurIPS'24.
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models (ICML'24)
The Patchscopes mini-site & the interactive explorable contain a brief introduction to the longer paper (ICML'24) by Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon.
Visualizing and Measuring the Geometry of BERT
bert-tree and context-atlas are repos for two interactive blogposts/visualizations for the paper Visualizing and Measuring the Geometry of BERT :
-
Language, trees, and geometry in neural networks explores the geometry of syntactic information in BERT (bert-tree)
-
Language, Context, and Geometry in Neural Network explores semantics and context in BERT. See the accompanying tool, Context Atlas, for more details (context-atlas)
Deep dreaming on text
text-dream contains different experiments and tools to work with deep dreaming for text.
LinguisticLens
data-synth-syntax contains LinguisticLens, a tool for visualizing generated text data.