| Xiaochen Li |
Brown University |
Preference Tuning For Toxicity Mitigation Generalizes Across Languages |
| Hadas Orgad |
Technion |
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines |
| Sonja Johnson-Yu |
Harvard University |
Understanding biological active sensing behaviors by interpreting learned artificial agent policies |
| Oliver Daniels |
Umass Amherst |
Hypothesis Testing Edge Attribution Patching |
| Nikhil Prakash |
Northeastern University |
How do Language Models Bind Human Beliefs? |
| Kenneth Li |
Harvard University |
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs |
| Josh Engels |
MIT |
Not All Language Model Features Are Linear |
| Shashata Sawmya and Linghao Kong |
Massachusetts Institute of Technology |
Neuronal Disentanglement and Sparse Expansion |
| Eric Todd |
Northeastern University |
Showing vs. Telling in LLMs |
| Sumedh Hindupur |
Harvard University |
Designing an interpretable neural network layer |
| Satpreet H Singh |
Harvard Medical School |
Emergent behaviour and neural dynamics in artificial agents tracking odour plumes |
| Xu Pan |
Harvard University |
Dissecting Query-Key Interaction in Vision Transformers |
| Arnab Sen Sharma |
Northeastern University |
Locating and Editing Factual Associations in Mamba |
| Yongyi Yang |
NTT Research |
Understanding the Concept Learning Dynamics |
| Binxu Wang |
Kempner Institute, Harvard University |
Raise one and infer three: Does generative models generalize on abstract rules for reasoning? |
| Eric Bigelow |
Harvard University |
In-Context Learning Dynamics as a Window Into the Mind of an LLM |
| David Baek |
MIT |
Generalization from Starvation: Representations in LLM Knowledge Graph Learning |
| Bhavya Vasudeva |
University of Southern California |
Towards a Control Theory of Language Models: Understanding When Instructions Override Priors |
| Shivam Raval |
Harvard University |
Sparse autoencoders find highly visualizable features in toy datasets |
| Core Francisco Park |
Harvard University / NTT |
Emergence of In-context learning beyond Bayesian retrieval: A mechanistic study |
| Tal Haklay |
Technion |
Automating position-aware circuit discovery |