The 2nd New England Mechanistic Interpretability (NEMI) workshop

August 22, 2025, Northeastern University, Boston

Accepted Work

Poster Session 1 (11:45 AM - 1:00 PM)

Listed by random order.

  1. Does FLUX Know What It’s Writing?
    Presenter: Adrian Chang
  2. “Describe Yourself in Three Words:” Disentangling Polysemantic Activations through Recursive Decoding
    Presenter: Alexis Fox
  3. Detecting and characterizing planning in language models
    Presenter: Alice Rigg
  4. Shared Global and Local Geometry of Language Model Embeddings
    Presenter: Andrew Lee
  5. Filter Heads in Transformer LMs
    Presenter: Arnab Sen Sharma
  6. The attention mechanism underlying relational object generation in text-to-image diffusion transformers
    Presenter: Binxu Wang
  7. One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
    Presenter: Chris Wendler
  8. SAEs Are Good for Steering - If You Select the Right Features
    Presenter: Dana Arad
  9. Steering Large Language Models for Machine Translation Personalization
    Presenter: Daniel Scalena
  10. Cross-Modal Interaction Quantification and Analysis for Vision–Language Models
    Presenter: Divya Appapogu
  11. An Analysis of Feature Hierarchies in VGG16 and Contextual Limitations in a Transformer
    Presenter: Donald Winkelman
  12. Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics
    Presenter: Eric Bigelow
  13. Pinpointing Attention-Causal Communication in Language Models
    Presenter: Gabriel Franco
  14. The Trilemma of Truth in Large Language Models
    Presenter: Germans Savcisens
  15. Investigating Conjugation in Multilingual Models
    Presenter: Isabella Gidi
  16. Showing vs. Telling: Distributed and Reusable Mechanisms in LLMs
    Presenter: Jacob Li
  17. Understanding Spatial Reasoning in VLMs
    Presenter: Kelly Cui
  18. Binary sparse coding for interpretability
    Presenter: Lucia Quirke
  19. Mechanisms of In-Context Syntactic Generalization in Language Models
    Presenter: Meng Lu
  20. On the Complexity of Neural Computation in Superposition
    Presenter: Micah Adler
  21. Confirmation Bias in Vision-Language Numerical Reasoning
    Presenter: Michal Golovanevsky
  22. Interventional Feature Steering on Deterministic Code Tasks: Towards Practical Benchmarks for Mechanistic Interpretability
    Presenter: Nathan Clark
  23. Unified Approaches to Cross-Modality Interpretability in Deep Learning Systems
    Presenter: Nevasini Sasikumar
  24. Discovering Interpretable Concepts in Large Generative Music Models
    Presenter: Nikhil Singh
  25. MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
    Presenter: Nishant Subramani
  26. Quiet Feature Learning in Algorithmic Tasks
    Presenter: Prudhviraj Naidu
  27. Mechanistic Comparison of Protein Language Models and Profile HMMs for Biological Knowledge Alignment
    Presenter: Saishradha Mohanty
  28. Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective
    Presenter: Seungwook Han
  29. Vector Arithmetic in Concept and Token Subspaces
    Presenter: Sheridan Feucht
  30. Exploiting Frontier Vision-Language Models with Low-Cost Transferable Adversarial Image Attacks
    Presenter: Stanislav Fort
  31. AutoPsych: Automated Psychophysics for Interpretability and Diversity Benchmarking
    Presenter: Sunny Liu
  32. Reward-Informed Sparse Autoencoders Reveal Evolving Patterns of Latent Reasoning in Large Language Models
    Presenter: Tanvi Nagilla, Alex Jameson, Daniel Manta, Shayaan Uddin, Ryan Lagasse
  33. How Do Vision-Language Models Process Conflicting Information Across Modalities?
    Presenter: Tianze (Etha) Hua
  34. Density estimation with LLMs: a geometric investigation of in-context learning trajectories
    Presenter: Toni Jianbang Liu
  35. From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
    Presenter: Valerie Costa
  36. On the Predictive Power of Representation Dispersion in Language Models
    Presenter: Yanhong Li
  37. Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
    Presenter: Yik Siu Chan
  38. Attendome: Universal Embedding Space for Attention Heads Across Transformer Models
    Presenter: Zhuofan (Josh) Ying

Poster Session 2 (02:00 PM - 3:15 PM)

Listed by random order.

  1. Using Sparsity to Safely Scope Models
    Presenter: Adriano Hernandez
  2. Converting MLPs into Polynomials in Closed Form
    Presenter: Alice Rigg
  3. Vision Transformers Don't Need Trained Registers
    Presenter: Amil Dravid
  4. Using geometries of truth to identify factual recall in LLMs
    Presenter: Angelos Poulis
  5. Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis
    Presenter: Aruna Sankaranarayanan
  6. Counterfactual Resampling Locates Reasoning Errors
    Presenter: Christopher A Merck
  7. SAEfarer: Exploring Text Classification Models with Sparse Autoencoders
    Presenter: Daniel Kerrigan
  8. The Interpretable Geometry of Writing Style: Using Rotations and Scaling to Align LLaMA, Mistral, and Gemma
    Presenter: David Turturean
  9. Internal states before wait modulate reasoning patterns
    Presenter: Dmitrii Troitskii
  10. ModelBatch: A Lightweight Package for Accelerating Small Model Training
    Presenter: Enyan Zhang
  11. Multi-scale Graph Skeletonization for Circuit Analysis
    Presenter: Fateme Hashemi Chaleshtori
  12. Actionable Interpretability for Human Translation Workflows
    Presenter: Gabriele Sarti
  13. Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
    Presenter: Helena Casademunt
  14. Effects of Language Similarity on Large Language Model Translation
    Presenter: Jacob Brinton
  15. Ensemble Circuit Analysis
    Presenter: John Bowlan
  16. To Bind or Not to Bind: A Layer-Wise Dissection of Binding Information in ESM
    Presenter: Kevin Lu
  17. eDIF: A European Deep Inference Fabric for Remote Mechanistic Interpretability
    Presenter: Marc Guggenberger
  18. Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline
    Presenter: Meng Lu, Ruochen Zhang
  19. Circuit-tracer: A New Library for Feature Circuits
    Presenter: Michael Hanna
  20. The Role of Mechanistic Interpretability in Explaining How Neural Networks Generalize
    Presenter: Nathan Stringham
  21. Language Models use Lookbacks to Track Beliefs
    Presenter: Nikhil Prakash
  22. Towards combinatorial interpretability of neural computation
    Presenter: Nir Shavit
  23. I Have No Mouth and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
    Presenter: Oliver McLaughlin
  24. Mapping Layer Similarity in Large Language Models with RSA
    Presenter: Ritik Bompilwar
  25. Hidden Breakthroughs in Language Model Training
    Presenter: Sara Kangaslahti
  26. Sparse Autoencoder Features for Classifications and Transferability
    Presenter: Shan Chen
  27. All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
    Presenter: Siddarth Mamidanna
  28. Priors in Time: A Generative View of Sparse Autoencoders for Sequential Representations
    Presenter: Sumedh Hindupur
  29. Automating Subnetworks Analysis with a Mechanistic Interpretability Agent
    Presenter: Tal Haklay
  30. The Hedonic Neuron: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
    Presenter: Tanya Chowdhury
  31. Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
    Presenter: Todd Nief
  32. Challenges in Understanding Modality Conflict in Vision-Language Models
    Presenter: Trang Nguyen
  33. Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
    Presenter: Vibhas Nair
  34. Backdoor Detection using Logit Distribution Sequences
    Presenter: Vinith Suriyakumar
  35. Shared Causal Mechanisms for Factual and Moral Value Comparisons in Language Models
    Presenter: Yik Siu Chan
  36. Superposition yields robust neural scaling
    Presenter: Yizhou Liu
  37. Mechanistic Understanding of Entity Tracking with Operations and Chain-of-Thought
    Presenter: Zilu (Peter) Tang