Interpretable Generative Models through
Post-hoc Concept Bottlenecks

UC San Diego

* indicates equal contribution
CVPR 2025

Abstract

  • Recent work on Concept Bottleneck Models (CBMs) [1-5] focus solely on the image classification task, while we focus on the image generation task.
  • The existing approach, CBGM [6], to design interpretable generative models based on CBMs is not efficient and scalable, as it requires expensive generative model training from scratch as well as real images with labor-intensive concept supervision.
    • To address this, we propose efficient post-hoc concept bottleneck training for frozen pretrained generative models with our novel concept bottleneck autoencoder (CB-AE) and concept controller (CC), providing interpretability at minimal training cost.
    • Our approach enables efficient and scalable training by using generated images and our method can work with minimal to no concept supervision.
    • We demonstrate the superior interpretability and steerability of our methods on numerous standard datasets like CelebA, CelebA-HQ, and CUB with large improvements (average ~25%) over the prior work, while being 4-15x faster to train.

Figure 1. Comparison with prior work on concept bottleneck generative models [6]

Characteristic Comparison

  • Our CB-AE enables more efficient training than CBGM [6] and transforms a frozen pretrained generative model into a CBM.
  • On the other hand, our CC trades-off inherent interpretability for better steerability, image quality, and faster training

Table 1. Characteristic comparisons of our CB-AE and CC with prior work CBGM [6]


Method

1. Post-hoc Concept Bottleneck Autoencoder (CB-AE)

Our training involves three objectives:

  • Objective 1. Reconstruction losses for auto-encoding at latent and image level.
  • Figure 2A. Reconstruction losses for CB-AE training (only \(E, D\) are trainable)

  • Objective 2. Concept alignment loss with pseudo-label supervision from zero-shot CLIP or off-the-shelf classifiers.
  • Figure 2B. Concept alignment loss for CB-AE training (only \(E\) is trainable)

  • Objective 3. Intervention losses simulate interventions at training-time and encourage intervened concept alignment.
  • Figure 2C. Intervention losses for CB-AE training (only \(E, D\) are trainable)

2. CB-AE Interventions

3. Optimization-based Interventions

We propose a novel intervention method inspired from adversarial attacks [7].

4. Post-hoc Concept Controller (CC) for Steering


Experiments

1. Concept Prediction

Figure 3. Qualitative evaluation of concept prediction

2. Steerability

Figure 4. Qualitative evaluation of concept steerability with CB-AE

Figure 5. Qualitative evaluation of concept steerability with optimization-based interventions

Table 2. Quantitative evaluation of concept steerability i.e. concept interventions (train time on 1 V100 GPU)

3. Generation Quality

Table 3. Generation quality evaluation using FID (train time in V100 GPU-hours)


Conclusion


Cite this work

A. Kulkarni, G. Yan, C. Sun, T. Oikarinen, and T.-W. Weng, Interpretable Generative Models through Post-hoc Concept Bottlenecks, CVPR 2025.
@inproceedings{kulkarni2025interpretable
    title={Interpretable Generative Models through Post-hoc Concept Bottlenecks},
    author={Kulkarni, Akshay and Yan, Ge and Sun, Chung-En and Oikarinen, Tuomas and Weng, Tsui-Wei},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025},
}
This webpage template was recycled from here.

Accessibility