Accepted Work
Poster Session 1 (11:45 AM - 1:00 PM)
Listed by random order.
- Does FLUX Know What It’s Writing?
- “Describe Yourself in Three Words:” Disentangling Polysemantic Activations through Recursive Decoding
- Detecting and characterizing planning in language models
- Shared Global and Local Geometry of Language Model Embeddings
- Filter Heads in Transformer LMs
- The attention mechanism underlying relational object generation in text-to-image diffusion transformers
- One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
- SAEs Are Good for Steering - If You Select the Right Features
- Steering Large Language Models for Machine Translation Personalization
- Cross-Modal Interaction Quantification and Analysis for Vision–Language Models
- An Analysis of Feature Hierarchies in VGG16 and Contextual Limitations in a Transformer
- Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics
- Pinpointing Attention-Causal Communication in Language Models
- The Trilemma of Truth in Large Language Models
- Investigating Conjugation in Multilingual Models
- Showing vs. Telling: Distributed and Reusable Mechanisms in LLMs
- Understanding Spatial Reasoning in VLMs
- Binary sparse coding for interpretability
- Mechanisms of In-Context Syntactic Generalization in Language Models
- On the Complexity of Neural Computation in Superposition
- Confirmation Bias in Vision-Language Numerical Reasoning
- Interventional Feature Steering on Deterministic Code Tasks: Towards Practical Benchmarks for Mechanistic Interpretability
- Unified Approaches to Cross-Modality Interpretability in Deep Learning Systems
- Discovering Interpretable Concepts in Large Generative Music Models
- MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
- Quiet Feature Learning in Algorithmic Tasks
- Mechanistic Comparison of Protein Language Models and Profile HMMs for Biological Knowledge Alignment
- Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective
- Vector Arithmetic in Concept and Token Subspaces
- Exploiting Frontier Vision-Language Models with Low-Cost Transferable Adversarial Image Attacks
- AutoPsych: Automated Psychophysics for Interpretability and Diversity Benchmarking
- Reward-Informed Sparse Autoencoders Reveal Evolving Patterns of Latent Reasoning in Large Language Models
- How Do Vision-Language Models Process Conflicting Information Across Modalities?
- Density estimation with LLMs: a geometric investigation of in-context learning trajectories
- From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
- On the Predictive Power of Representation Dispersion in Language Models
- Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
- Attendome: Universal Embedding Space for Attention Heads Across Transformer Models
Poster Session 2 (02:00 PM - 3:15 PM)
Listed by random order.
- Using Sparsity to Safely Scope Models
- Converting MLPs into Polynomials in Closed Form
- Vision Transformers Don't Need Trained Registers
- Using geometries of truth to identify factual recall in LLMs
- Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis
- Counterfactual Resampling Locates Reasoning Errors
- SAEfarer: Exploring Text Classification Models with Sparse Autoencoders
- The Interpretable Geometry of Writing Style: Using Rotations and Scaling to Align LLaMA, Mistral, and Gemma
- Internal states before wait modulate reasoning patterns
- ModelBatch: A Lightweight Package for Accelerating Small Model Training
- Multi-scale Graph Skeletonization for Circuit Analysis
- Actionable Interpretability for Human Translation Workflows
- Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
- Effects of Language Similarity on Large Language Model Translation
- Ensemble Circuit Analysis
- To Bind or Not to Bind: A Layer-Wise Dissection of Binding Information in ESM
- eDIF: A European Deep Inference Fabric for Remote Mechanistic Interpretability
- Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline
- Circuit-tracer: A New Library for Feature Circuits
- The Role of Mechanistic Interpretability in Explaining How Neural Networks Generalize
- Language Models use Lookbacks to Track Beliefs
- Towards combinatorial interpretability of neural computation
- I Have No Mouth and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
- Mapping Layer Similarity in Large Language Models with RSA
- Hidden Breakthroughs in Language Model Training
- Sparse Autoencoder Features for Classifications and Transferability
- All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
- Priors in Time: A Generative View of Sparse Autoencoders for Sequential Representations
- Automating Subnetworks Analysis with a Mechanistic Interpretability Agent
- The Hedonic Neuron: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
- Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
- Challenges in Understanding Modality Conflict in Vision-Language Models
- Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
- Backdoor Detection using Logit Distribution Sequences
- Shared Causal Mechanisms for Factual and Moral Value Comparisons in Language Models
- Superposition yields robust neural scaling
- Mechanistic Understanding of Entity Tracking with Operations and Chain-of-Thought