Xiaochen Li |
Brown University |
Preference Tuning For Toxicity Mitigation Generalizes Across Languages |
Hadas Orgad |
Technion |
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines |
Sonja Johnson-Yu |
Harvard University |
Understanding biological active sensing behaviors by interpreting learned artificial agent policies |
Oliver Daniels |
Umass Amherst |
Hypothesis Testing Edge Attribution Patching |
Nikhil Prakash |
Northeastern University |
How do Language Models Bind Human Beliefs? |
Kenneth Li |
Harvard University |
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs |
Josh Engels |
MIT |
Not All Language Model Features Are Linear |
Shashata Sawmya and Linghao Kong |
Massachusetts Institute of Technology |
Neuronal Disentanglement and Sparse Expansion |
Eric Todd |
Northeastern University |
Showing vs. Telling in LLMs |
Sumedh Hindupur |
Harvard University |
Designing an interpretable neural network layer |
Satpreet H Singh |
Harvard Medical School |
Emergent behaviour and neural dynamics in artificial agents tracking odour plumes |
Xu Pan |
Harvard University |
Dissecting Query-Key Interaction in Vision Transformers |
Arnab Sen Sharma |
Northeastern University |
Locating and Editing Factual Associations in Mamba |
Yongyi Yang |
NTT Research |
Understanding the Concept Learning Dynamics |
Binxu Wang |
Kempner Institute, Harvard University |
Raise one and infer three: Does generative models generalize on abstract rules for reasoning? |
Eric Bigelow |
Harvard University |
In-Context Learning Dynamics as a Window Into the Mind of an LLM |
David Baek |
MIT |
Generalization from Starvation: Representations in LLM Knowledge Graph Learning |
Bhavya Vasudeva |
University of Southern California |
Towards a Control Theory of Language Models: Understanding When Instructions Override Priors |
Shivam Raval |
Harvard University |
Sparse autoencoders find highly visualizable features in toy datasets |
Core Francisco Park |
Harvard University / NTT |
Emergence of In-context learning beyond Bayesian retrieval: A mechanistic study |
Tal Haklay |
Technion |
Automating position-aware circuit discovery |