New England Mechanistic Interpretability Workshop Series

Home Registration Abstract Submission Schedule

NEMI 2024 Schedule
9:00am-10:00am Breakfast
10:00am-10:15am Welcome Remarks (Max Tegmark)
10:15am-10:20am Program Overview (Koyena Pal)
10:20am-12:15pm Morning Session
10:20-10:35 Opening Keynote Martin Wattenberg
10:35-10:50 The Platonic Representation Hypothesis Brian Cheung
10:50-11:05 NNsight: A Transparent API for blackbox AI Jaden Fiotto-Kaufman
11:05-11:15 Coffee Break
11:15-11:30 Instruction Drift Kenneth Li
11:30-11:45 Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Hidenori Tanaka
11:45-12:15 Panel Session I: How mechanistic interpretability can help keep AI safe and beneficial, w. Angie Boggust, David Krueger & Dylan Hadfield-Menell Max Tegmark (Moderator)
12:15pm-2:00pm Lunch / Presenters Round Tables
2:00pm-3:00pm Poster Session
3:00pm-4:55pm Afternoon Session
3:00-3:15 Opening Keynote Sarah Schwettmann
3:15-3:30 Multilevel Interpretability of Artificial Neural Networks: Leveraging Framework and Methods from Neuroscience Zhonghao He
3:30-3:45 Closing Keynote Sam Marks
3:45-4:00 Group Photo
4:00-4:10 Coffee Break
4:10-4:25 Summative Talk David Bau
4:25-4:55 Panel Session II: Mechanistic interpretability: state of play and promising directions, w. Martin Wattenberg, Byron Wallace, & Yonatan Belinkov David Bau (Moderator)
4:55pm-5:00pm Closing Remarks (David Bau)
Posters
Xiaochen Li Brown University Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Hadas Orgad Technion Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Sonja Johnson-Yu Harvard University Understanding biological active sensing behaviors by interpreting learned artificial agent policies
Oliver Daniels Umass Amherst Hypothesis Testing Edge Attribution Patching
Nikhil Prakash Northeastern University How do Language Models Bind Human Beliefs?
Kenneth Li Harvard University Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Josh Engels MIT Not All Language Model Features Are Linear
Shashata Sawmya and Linghao Kong Massachusetts Institute of Technology Neuronal Disentanglement and Sparse Expansion
Eric Todd Northeastern University Showing vs. Telling in LLMs
Sumedh Hindupur Harvard University Designing an interpretable neural network layer
Satpreet H Singh Harvard Medical School Emergent behaviour and neural dynamics in artificial agents tracking odour plumes
Xu Pan Harvard University Dissecting Query-Key Interaction in Vision Transformers
Arnab Sen Sharma Northeastern University Locating and Editing Factual Associations in Mamba
Yongyi Yang NTT Research Understanding the Concept Learning Dynamics
Binxu Wang Kempner Institute, Harvard University Raise one and infer three: Does generative models generalize on abstract rules for reasoning?
Eric Bigelow Harvard University In-Context Learning Dynamics as a Window Into the Mind of an LLM
David Baek MIT Generalization from Starvation: Representations in LLM Knowledge Graph Learning
Bhavya Vasudeva University of Southern California Towards a Control Theory of Language Models: Understanding When Instructions Override Priors
Shivam Raval Harvard University Sparse autoencoders find highly visualizable features in toy datasets
Core Francisco Park Harvard University / NTT Emergence of In-context learning beyond Bayesian retrieval: A mechanistic study
Tal Haklay Technion Automating position-aware circuit discovery