CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning

Amirhosein Javadi, Tuomas Oikarinen, Tara Javidi, Tsui-Wei (Lily) Weng

UC San Diego

TMLR 2026

article Paper code Code science Ablations

Abstract

CI-CBM addresses catastrophic forgetting in class-incremental learning while keeping the model’s decisions interpretable through human-understandable concepts.
The method learns new classes without storing old training samples, using concept regularization and pseudo-concept generation to preserve previous knowledge.
Across seven datasets, CI-CBM outperforms prior interpretable continual learning methods with an average 36% accuracy gain, approaching black-box performance while remaining explainable.

Method

CI-CBM builds on label-free concept bottleneck models and adapts them to exemplar-free class-incremental learning: new classes arrive over time, old training data is not stored, and predictions stay grounded in human-readable concepts.

Step 1: Unique concept set expansion. For each new phase, CI-CBM prompts a language model for class-related concepts, removes near-duplicates and class-name-like terms with text-embedding similarity, and updates the running concept vocabulary \(C_t \leftarrow C_{t-1} \cup \{\text{filtered new concepts}\}\).
Step 2: Embedding calculation based on \(X_t\). Given the current images \(X_t\) and expanded concept set \(C_t\), CI-CBM computes the image-text alignment matrix \(P^t\), where \(P^t[i,j] = E_I(x_i)^\top E_T(c_j)\). Here, \(E_I\) is the image encoder and \(E_T\) is the text encoder.
Step 3: CBL (\(W_C^t\)) learning. The concept bottleneck layer maps frozen backbone features into concept activations, \(f_c(x)=W_C^t f(x)\), while a distillation regularizer keeps previously learned concept neurons from drifting:
\[ L(W_C^t)=\sum_{i=1}^{M_t}-\frac{\overline{q_i^t}^{\,3}\cdot \overline{P^t_{:,i}}^{\,3}}{\|\overline{q_i^t}^{\,3}\|_2\|\overline{P^t_{:,i}}^{\,3}\|_2}+\beta\sum_{i=1}^{M_{t-1}}-\frac{\overline{q_i^t}^{\,3}\cdot \overline{q_i^{t-1}}^{\,3}}{\|\overline{q_i^t}^{\,3}\|_2\|\overline{q_i^{t-1}}^{\,3}\|_2}. \]
Here, \(q_i^t\) is the activation of concept neuron \(i\) at phase \(t\), \(P^t_{:,i}\) is the target image-text alignment for concept \(i\), \(M_t\) is the number of concepts after expansion, \(M_{t-1}\) is the previous concept count, and \(\beta\) controls the distillation strength.
Step 4: Pseudo-feature and pseudo-concept generation. For each past class \(c_p\), CI-CBM finds the nearest new class \(c_n\), shifts features from that new-class distribution toward the stored past-class centroid, and projects the generated pseudo-features into concept space:

Figure 1. Pseudo-feature generation: CI-CBM shifts the nearest new-class feature distribution toward each past-class centroid to synthesize pseudo-features for old classes.

\[ \hat{f}(c_p) = f(c_n) - \mu(c_n) + \mu(c_p), \qquad \hat{f}_c(c_p) = W_C^t \hat{f}(c_p). \]
Step 5: Sparse FC layer (\(W_F^t\)) learning with actual and pseudo-concepts. The sparse final classifier is trained with actual concepts for the current phase and pseudo-concepts for previous phases:
\[ \min_{W_F^t,b_F^t} \sum_{(x_i,y_i)\in D_1 \cup \dots \cup D_{t-1}} L_{ce}(W_F^t \hat{f}_c(x_i)+b_F^t,y_i) + \sum_{(x_i,y_i)\in D_t} L_{ce}(W_F^t f_c(x_i)+b_F^t,y_i) + \lambda R_\alpha(W_F^t). \]

Figure 2. CI-CBM pipeline overview.

Experiments

1. Comparison to interpretable methods

CI-CBM substantially improves average incremental accuracy over prior interpretable continual learning methods (ICICLE, IN2, CONCIL) across CIFAR-10/100, CUB, TinyImageNet, Places365, and ImageNet, while staying close to an impractical full-rehearsal upper bound that stores all past data.

Table 1: interpretable methods comparison

Table 1. Comparison to other interpretable class-incremental methods.

2. Comparison to non-pretrained and non-interpretable methods

With a ResNet backbone trained from scratch in the first phase, CI-CBM remains competitive with strong unrestricted baselines; we report tabular comparisons and average-accuracy curves over incremental phases on CIFAR-100, TinyImageNet, and ImageNet-Subset.

Table 2: non-pretrained non-interpretable comparison

Table 2. Comparison to exemplar-free methods without an interpretability constraint (ResNet, non-pretrained setting).

Table 3: non-pretrained methods continued

Table 3. Comparison to prompt-based EFCIL methods with a DeiT backbone trained from first-phase data (non-pretrained setting).

Figure 3. Average accuracy over incremental phases vs. unrestricted ResNet-based methods.

3. Comparison to pretrained ViT-based methods

Using a ViT-Base/16 backbone pretrained on ImageNet-21k, CI-CBM achieves competitive accuracy relative to state-of-the-art prompt-based continual learning methods while retaining a concept bottleneck.

Table 4. Pretrained ViT-based exemplar-free continual learning (average incremental accuracy).

4. Interpretability and insights on model reasoning

Global interpretability. CI-CBM explains each learned class through the sparse final-layer weights \(W_F^t\), which connect human-readable concept neurons to class logits. For a class \(i\), the row \(W_F^t[i,:]\) shows which concepts support the class with positive weights and which concepts oppose it with negative weights. In the visualization, line width is proportional to \(|W_F^t[i,j]|\), and negative concepts are shown as “NOT” concepts. This makes the model’s class-level decision rule inspectable across incremental phases.

Tree Swallow concept-weight visualization

Figure 4. Global view of concept-to-class weights for Tree Swallow on CUB: which interpretable concepts support or oppose the class, and how that structure appears under a multi-phase class-incremental setup.

Local interpretability. CI-CBM explains an individual prediction by decomposing the class score into concept contributions. For input image \(x_k\), class \(i\), and concept \(j\), the contribution is computed as \(\text{Contrib}(x_k,i,j)=W_F^t[i,j]\times f_c^t(x_k)[j]\). Here, \(f_c^t(x_k)[j]\) is the image’s activation on concept \(j\), and \(W_F^t[i,j]\) is how strongly that concept affects class \(i\). Large positive contributions support the prediction, while negative contributions push against it. This shows why the same old-class image can remain correctly classified as new classes are introduced.

Local concept contributions for one prediction

Figure 5. Local view of concept-level contributions for one image across incremental phases on ImageNet-Subset: how salient concepts for the prediction shift as more classes are learned.

Conclusion

CI-CBM shows that interpretability and strong continual learning performance can coexist, reducing the usual trade-off between accuracy and transparency.
By regularizing the concept bottleneck and generating pseudo-concepts for old classes, CI-CBM learns new classes without storing past training data.
The results demonstrate that CI-CBM is practical and flexible, working well in both pretrained and non-pretrained settings across diverse benchmarks.

Cite this work

A. Javadi, T. Oikarinen, T. Javidi, and T.-W. Weng, CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning, TMLR 2026.

@article{javadi2026ci,
    title={CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning},
    author={Javadi, Amirhosein and Oikarinen, Tuomas and Javidi, Tara and Weng, Tsui-Wei},
    journal={Transactions on Machine Learning Research},
    year={2026},
}

This webpage template was recycled from here.

Accessibility