Neuron explainer

May 9, 2023 ยท View on GitHub

This directory contains a version of our code for generating, simulating and scoring explanations of neuron behavior.

Setup

pip install -e .

Usage

For example usage, see the demos folder:

  • Generating and scoring activation-based explanations
  • Generating and scoring explanations based on tokens with high average activations
  • Generating explanations for human-written neuron puzzles