Principled Interpretability in Vision Models

CVPR 2026 Tutorial

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
๐Ÿ“… Date: June 3 or 4, 2026 ยท ๐ŸŒ Location: Denver, CO, USA


Overview

As deep learning systems are increasingly deployed in high-stakes applications, understanding their behavior is critical for ensuring trust and safety. Interpretability provides essential tools to explain, debug, and improve these models. However, the field remains fragmented, spanning a wide range of methods and assumptions, while lacking standardized evaluation protocols.


Tutorial Outline

  1. Post-hoc mechanistic interpretability: Methods that analyze model internals at different levels of granularity (neurons, layers, circuits), with strengths and limitations.
  2. Faithfulness and reliability evaluation: Protocols and standardized metrics for assessing interpretability methods and producing actionable explanations.
  3. Interpretable DNN models by design: Concept bottleneck models and related approaches that align internal representations with human-understandable concepts.
  4. Applications: Debugging, model editing, and safety auditing in practical settings.

Intended Audience

This tutorial is intended for researchers and practitioners working on computer vision and modern deep learning systems, as well as graduate students entering interpretability research. No prior experience in interpretability is required.

Materials

Please stay tuned on the Tutorial schedule and Agenda! Slides and supplementary materials will be posted here after the tutorial.


Contact

โœ‰๏ธ Lily Weng (lweng@ucsd.edu), Tuomas Oikarinen (toikarinen@ucsd.edu)