CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning

UC San Diego
TMLR 2026

Abstract

  • CI-CBM addresses catastrophic forgetting in class-incremental learning while keeping the model’s decisions interpretable through human-understandable concepts.
  • The method learns new classes without storing old training samples, using concept regularization and pseudo-concept generation to preserve previous knowledge.
  • Across seven datasets, CI-CBM outperforms prior interpretable continual learning methods with an average 36% accuracy gain, approaching black-box performance while remaining explainable.


Method

CI-CBM builds on label-free concept bottleneck models and adapts them to exemplar-free class-incremental learning: new classes arrive over time, old training data is not stored, and predictions stay grounded in human-readable concepts.

  • Step 1: Unique concept set expansion. For each new phase, CI-CBM prompts a language model for class-related concepts, removes near-duplicates and class-name-like terms with text-embedding similarity, and updates the running concept vocabulary \(C_t \leftarrow C_{t-1} \cup \{\text{filtered new concepts}\}\).
  • Step 2: Embedding calculation based on \(X_t\). Given the current images \(X_t\) and expanded concept set \(C_t\), CI-CBM computes the image-text alignment matrix \(P^t\), where \(P^t[i,j] = E_I(x_i)^\top E_T(c_j)\). Here, \(E_I\) is the image encoder and \(E_T\) is the text encoder.
  • Step 3: CBL (\(W_C^t\)) learning. The concept bottleneck layer maps frozen backbone features into concept activations, \(f_c(x)=W_C^t f(x)\), while a distillation regularizer keeps previously learned concept neurons from drifting:
    \[ L(W_C^t)=\sum_{i=1}^{M_t}-\frac{\overline{q_i^t}^{\,3}\cdot \overline{P^t_{:,i}}^{\,3}}{\|\overline{q_i^t}^{\,3}\|_2\|\overline{P^t_{:,i}}^{\,3}\|_2}+\beta\sum_{i=1}^{M_{t-1}}-\frac{\overline{q_i^t}^{\,3}\cdot \overline{q_i^{t-1}}^{\,3}}{\|\overline{q_i^t}^{\,3}\|_2\|\overline{q_i^{t-1}}^{\,3}\|_2}. \]
    Here, \(q_i^t\) is the activation of concept neuron \(i\) at phase \(t\), \(P^t_{:,i}\) is the target image-text alignment for concept \(i\), \(M_t\) is the number of concepts after expansion, \(M_{t-1}\) is the previous concept count, and \(\beta\) controls the distillation strength.
  • Step 4: Pseudo-feature and pseudo-concept generation. For each past class \(c_p\), CI-CBM finds the nearest new class \(c_n\), shifts features from that new-class distribution toward the stored past-class centroid, and projects the generated pseudo-features into concept space:
    Pseudo-feature generation illustration

    Figure 1. Pseudo-feature generation: CI-CBM shifts the nearest new-class feature distribution toward each past-class centroid to synthesize pseudo-features for old classes.

    \[ \hat{f}(c_p) = f(c_n) - \mu(c_n) + \mu(c_p), \qquad \hat{f}_c(c_p) = W_C^t \hat{f}(c_p). \]
  • Step 5: Sparse FC layer (\(W_F^t\)) learning with actual and pseudo-concepts. The sparse final classifier is trained with actual concepts for the current phase and pseudo-concepts for previous phases:
    \[ \min_{W_F^t,b_F^t} \sum_{(x_i,y_i)\in D_1 \cup \dots \cup D_{t-1}} L_{ce}(W_F^t \hat{f}_c(x_i)+b_F^t,y_i) + \sum_{(x_i,y_i)\in D_t} L_{ce}(W_F^t f_c(x_i)+b_F^t,y_i) + \lambda R_\alpha(W_F^t). \]
CI-CBM pipeline overview

Figure 2. CI-CBM pipeline overview.


Experiments

1. Comparison to interpretable methods

Table 1: interpretable methods comparison

Table 1. Comparison to other interpretable class-incremental methods.

2. Comparison to non-pretrained and non-interpretable methods

Table 2: non-pretrained non-interpretable comparison

Table 2. Comparison to exemplar-free methods without an interpretability constraint (ResNet, non-pretrained setting).

Table 3: non-pretrained methods continued

Table 3. Comparison to prompt-based EFCIL methods with a DeiT backbone trained from first-phase data (non-pretrained setting).

Average accuracy over incremental phases

Figure 3. Average accuracy over incremental phases vs. unrestricted ResNet-based methods.

3. Comparison to pretrained ViT-based methods

Table 4: pretrained ViT comparison

Table 4. Pretrained ViT-based exemplar-free continual learning (average incremental accuracy).

4. Interpretability and insights on model reasoning

Tree Swallow concept-weight visualization

Figure 4. Global view of concept-to-class weights for Tree Swallow on CUB: which interpretable concepts support or oppose the class, and how that structure appears under a multi-phase class-incremental setup.

Local concept contributions for one prediction

Figure 5. Local view of concept-level contributions for one image across incremental phases on ImageNet-Subset: how salient concepts for the prediction shift as more classes are learned.


Conclusion


Cite this work

A. Javadi, T. Oikarinen, T. Javidi, and T.-W. Weng, CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning, TMLR 2026.
@article{javadi2026ci,
    title={CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning},
    author={Javadi, Amirhosein and Oikarinen, Tuomas and Javidi, Tara and Weng, Tsui-Wei},
    journal={Transactions on Machine Learning Research},
    year={2026},
}
This webpage template was recycled from here.

Accessibility