Principled Interpretability in Vision Models
CVPR 2026 Tutorial
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Overview
As deep learning systems are increasingly deployed in high-stakes applications, understanding their behavior is critical for ensuring trust and safety. Interpretability provides essential tools to explain, debug, and improve these models. However, the field remains fragmented, spanning a wide range of methods and assumptions, while lacking standardized evaluation protocols.
-
This tutorial aims to provide a unified overview of interpretability in deep learning โ bridging post-hoc mechanistic understanding and methods to design inherently interpretable deep learning models.
-
By the end of this tutorial, attendees will gain a solid understanding of modern interpretability methods for deep learning models, how to rigorously evaluate them, and open research directions in this critical area.
Tutorial Outline
- Post-hoc mechanistic interpretability: Methods that analyze model internals at different levels of granularity (neurons, layers, circuits), with strengths and limitations.
- Faithfulness and reliability evaluation: Protocols and standardized metrics for assessing interpretability methods and producing actionable explanations.
- Interpretable DNN models by design: Concept bottleneck models and related approaches that align internal representations with human-understandable concepts.
- Applications: Debugging, model editing, and safety auditing in practical settings.
Intended Audience
This tutorial is intended for researchers and practitioners working on computer vision and modern deep learning systems, as well as graduate students entering interpretability research. No prior experience in interpretability is required.
Materials
Please stay tuned on the Tutorial schedule and Agenda! Slides and supplementary materials will be posted here after the tutorial.
Contact
โ๏ธ Lily Weng (lweng@ucsd.edu), Tuomas Oikarinen (toikarinen@ucsd.edu)