NEMI: New England Mechanistic Interpretability

NEMI 2024 Schedule

9:00am-10:00am

Breakfast

10:00am-10:15am

Welcome Remarks (Max Tegmark)

10:15am-10:20am

Program Overview (Koyena Pal)

10:20am-12:15pm

Morning Session

10:20-10:35	Opening Keynote	Martin Wattenberg
10:35-10:50	The Platonic Representation Hypothesis	Brian Cheung
10:50-11:05	NNsight: A Transparent API for blackbox AI	Jaden Fiotto-Kaufman
11:05-11:15	Coffee Break
11:15-11:30	Instruction Drift	Kenneth Li
11:30-11:45	Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks	Hidenori Tanaka
11:45-12:15	Panel Session I: How mechanistic interpretability can help keep AI safe and beneficial, w. Angie Boggust, David Krueger & Hadas Ograd	Max Tegmark (Moderator)

12:15pm-2:00pm

Lunch / Presenters Round Tables

2:00pm-3:00pm

Poster Session

3:00pm-4:55pm

Afternoon Session

3:00-3:15	Opening Keynote	Sarah Schwettmann
3:15-3:30	Multilevel Interpretability of Artificial Neural Networks: Leveraging Framework and Methods from Neuroscience	Zhonghao He
3:30-3:45	Closing Keynote	Sam Marks
3:45-4:00	Group Photo
4:00-4:10	Coffee Break
4:10-4:25	Summative Talk	David Bau
4:25-4:55	Panel Session II: Mechanistic interpretability: state of play and promising directions, w. Martin Wattenberg, Byron Wallace, & Yonatan Belinkov	David Bau (Moderator)

4:55pm-5:00pm

Closing Remarks (David Bau)

Posters

Xiaochen Li	Brown University	Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Hadas Orgad	Technion	Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Sonja Johnson-Yu	Harvard University	Understanding biological active sensing behaviors by interpreting learned artificial agent policies
Oliver Daniels	Umass Amherst	Hypothesis Testing Edge Attribution Patching
Nikhil Prakash	Northeastern University	How do Language Models Bind Human Beliefs?
Kenneth Li	Harvard University	Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Josh Engels	MIT	Not All Language Model Features Are Linear
Shashata Sawmya and Linghao Kong	Massachusetts Institute of Technology	Neuronal Disentanglement and Sparse Expansion
Eric Todd	Northeastern University	Showing vs. Telling in LLMs
Sumedh Hindupur	Harvard University	Designing an interpretable neural network layer
Satpreet H Singh	Harvard Medical School	Emergent behaviour and neural dynamics in artificial agents tracking odour plumes
Xu Pan	Harvard University	Dissecting Query-Key Interaction in Vision Transformers
Arnab Sen Sharma	Northeastern University	Locating and Editing Factual Associations in Mamba
Yongyi Yang	NTT Research	Understanding the Concept Learning Dynamics
Binxu Wang	Kempner Institute, Harvard University	Raise one and infer three: Does generative models generalize on abstract rules for reasoning?
Eric Bigelow	Harvard University	In-Context Learning Dynamics as a Window Into the Mind of an LLM
David Baek	MIT	Generalization from Starvation: Representations in LLM Knowledge Graph Learning
Bhavya Vasudeva	University of Southern California	Towards a Control Theory of Language Models: Understanding When Instructions Override Priors
Shivam Raval	Harvard University	Sparse autoencoders find highly visualizable features in toy datasets
Core Francisco Park	Harvard University / NTT	Emergence of In-context learning beyond Bayesian retrieval: A mechanistic study
Tal Haklay	Technion	Automating position-aware circuit discovery