Accepted Work

Poster Session 1 (11:45 AM - 1:00 PM)

Listed by random order.

Does FLUX Know What It’s Writing?
Presenter: Adrian Chang
“Describe Yourself in Three Words:” Disentangling Polysemantic Activations through Recursive Decoding
Presenter: Alexis Fox
Detecting and characterizing planning in language models
Presenter: Alice Rigg
Shared Global and Local Geometry of Language Model Embeddings
Presenter: Andrew Lee
Filter Heads in Transformer LMs
Presenter: Arnab Sen Sharma
The attention mechanism underlying relational object generation in text-to-image diffusion transformers
Presenter: Binxu Wang
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
Presenter: Chris Wendler
SAEs Are Good for Steering - If You Select the Right Features
Presenter: Dana Arad
Steering Large Language Models for Machine Translation Personalization
Presenter: Daniel Scalena
Cross-Modal Interaction Quantification and Analysis for Vision–Language Models
Presenter: Divya Appapogu
An Analysis of Feature Hierarchies in VGG16 and Contextual Limitations in a Transformer
Presenter: Donald Winkelman
Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics
Presenter: Eric Bigelow
Pinpointing Attention-Causal Communication in Language Models
Presenter: Gabriel Franco
The Trilemma of Truth in Large Language Models
Presenter: Germans Savcisens
Investigating Conjugation in Multilingual Models
Presenter: Isabella Gidi
Showing vs. Telling: Distributed and Reusable Mechanisms in LLMs
Presenter: Jacob Li
Understanding Spatial Reasoning in VLMs
Presenter: Kelly Cui
Binary sparse coding for interpretability
Presenter: Lucia Quirke
Mechanisms of In-Context Syntactic Generalization in Language Models
Presenter: Meng Lu
On the Complexity of Neural Computation in Superposition
Presenter: Micah Adler
Confirmation Bias in Vision-Language Numerical Reasoning
Presenter: Michal Golovanevsky
Interventional Feature Steering on Deterministic Code Tasks: Towards Practical Benchmarks for Mechanistic Interpretability
Presenter: Nathan Clark
Unified Approaches to Cross-Modality Interpretability in Deep Learning Systems
Presenter: Nevasini Sasikumar
Discovering Interpretable Concepts in Large Generative Music Models
Presenter: Nikhil Singh
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
Presenter: Nishant Subramani
Quiet Feature Learning in Algorithmic Tasks
Presenter: Prudhviraj Naidu
Mechanistic Comparison of Protein Language Models and Profile HMMs for Biological Knowledge Alignment
Presenter: Saishradha Mohanty
Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective
Presenter: Seungwook Han
Vector Arithmetic in Concept and Token Subspaces
Presenter: Sheridan Feucht
Exploiting Frontier Vision-Language Models with Low-Cost Transferable Adversarial Image Attacks
Presenter: Stanislav Fort
AutoPsych: Automated Psychophysics for Interpretability and Diversity Benchmarking
Presenter: Sunny Liu
Reward-Informed Sparse Autoencoders Reveal Evolving Patterns of Latent Reasoning in Large Language Models
Presenter: Tanvi Nagilla, Alex Jameson, Daniel Manta, Shayaan Uddin, Ryan Lagasse
How Do Vision-Language Models Process Conflicting Information Across Modalities?
Presenter: Tianze (Etha) Hua
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Presenter: Toni Jianbang Liu
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Presenter: Valerie Costa
On the Predictive Power of Representation Dispersion in Language Models
Presenter: Yanhong Li
Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
Presenter: Yik Siu Chan
Attendome: Universal Embedding Space for Attention Heads Across Transformer Models
Presenter: Zhuofan (Josh) Ying

Poster Session 2 (02:00 PM - 3:15 PM)

Listed by random order.

Using Sparsity to Safely Scope Models
Presenter: Adriano Hernandez
Converting MLPs into Polynomials in Closed Form
Presenter: Alice Rigg
Vision Transformers Don't Need Trained Registers
Presenter: Amil Dravid
Using geometries of truth to identify factual recall in LLMs
Presenter: Angelos Poulis
Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis
Presenter: Aruna Sankaranarayanan
Counterfactual Resampling Locates Reasoning Errors
Presenter: Christopher A Merck
SAEfarer: Exploring Text Classification Models with Sparse Autoencoders
Presenter: Daniel Kerrigan
The Interpretable Geometry of Writing Style: Using Rotations and Scaling to Align LLaMA, Mistral, and Gemma
Presenter: David Turturean
Internal states before wait modulate reasoning patterns
Presenter: Dmitrii Troitskii
ModelBatch: A Lightweight Package for Accelerating Small Model Training
Presenter: Enyan Zhang
Multi-scale Graph Skeletonization for Circuit Analysis
Presenter: Fateme Hashemi Chaleshtori
Actionable Interpretability for Human Translation Workflows
Presenter: Gabriele Sarti
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Presenter: Helena Casademunt
Effects of Language Similarity on Large Language Model Translation
Presenter: Jacob Brinton
Ensemble Circuit Analysis
Presenter: John Bowlan
To Bind or Not to Bind: A Layer-Wise Dissection of Binding Information in ESM
Presenter: Kevin Lu
eDIF: A European Deep Inference Fabric for Remote Mechanistic Interpretability
Presenter: Marc Guggenberger
Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline
Presenter: Meng Lu, Ruochen Zhang
Circuit-tracer: A New Library for Feature Circuits
Presenter: Michael Hanna
The Role of Mechanistic Interpretability in Explaining How Neural Networks Generalize
Presenter: Nathan Stringham
Language Models use Lookbacks to Track Beliefs
Presenter: Nikhil Prakash
Towards combinatorial interpretability of neural computation
Presenter: Nir Shavit
I Have No Mouth and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
Presenter: Oliver McLaughlin
Mapping Layer Similarity in Large Language Models with RSA
Presenter: Ritik Bompilwar
Hidden Breakthroughs in Language Model Training
Presenter: Sara Kangaslahti
Sparse Autoencoder Features for Classifications and Transferability
Presenter: Shan Chen
All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
Presenter: Siddarth Mamidanna
Priors in Time: A Generative View of Sparse Autoencoders for Sequential Representations
Presenter: Sumedh Hindupur
Automating Subnetworks Analysis with a Mechanistic Interpretability Agent
Presenter: Tal Haklay
The Hedonic Neuron: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
Presenter: Tanya Chowdhury
Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
Presenter: Todd Nief
Challenges in Understanding Modality Conflict in Vision-Language Models
Presenter: Trang Nguyen
Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
Presenter: Vibhas Nair, Amrithaa Ashok Kumar
Backdoor Detection using Logit Distribution Sequences
Presenter: Vinith Suriyakumar
Shared Causal Mechanisms for Factual and Moral Value Comparisons in Language Models
Presenter: Yik Siu Chan
Superposition yields robust neural scaling
Presenter: Yizhou Liu
Mechanistic Understanding of Entity Tracking with Operations and Chain-of-Thought
Presenter: Zilu (Peter) Tang

The 2nd New England Mechanistic Interpretability (NEMI) workshop

August 22, 2025, Northeastern University, Boston

Accepted Work

Poster Session 1 (11:45 AM - 1:00 PM)

Poster Session 2 (02:00 PM - 3:15 PM)