Interpreting Neurons in Deep Vision Networks with Language Models

Nicholas Bai*, Rahul Ajay Iyer*, Tuomas Oikarinen, Akshay Kulkarni, Tsui-Wei (Lily) Weng

UCSD

*indicates equal contribution.

TMLR 2025

description Paper code Code dashboard Neuron Examples dashboard Ablation Studies

Abstract

In this paper, we propose Describe-and-Dissect, a novel method to describe the roles of hidden neurons in vision networks.

Describe-and-Dissect (DnD) utilizes recent advancements in multimodal deep learning to produce complex natural language descriptions, without the need for labeled training data or a predefined set of concepts to choose from. Additionally, Describe-and-Dissect is training-free, meaning we don’t train any new models and can easily leverage more capable general purpose models in the future. We show on a large scale user study that our method outperforms the state-of-the-art baseline methods including CLIP-Dissect, MILAN, and Network Dissection. Our method on average provides the highest quality labels and is more than 2× as likely to be selected as the best explanation for a neuron than the best baseline.

Method

Overview of Describe and Dissect neuron explanation pipeline. We highlight the three primary steps in the algorithm: 1) Probing Set Augmentation, 2) Candidate Concept Generation, and 3) Best Concept Selection.

Overview of Describe and Dissect pipeline.

Step 1: Probing Set Augmentation
Salient regions are cropped from a neuron's activating images and augmented to the original probing dataset.

Step 2: Candidate Concept Generation
An Image-to-Text model (BLIP) and an LLM (GPT) generate and summarize neuron descriptions.

Step 3: Best Concept Selection
Neuron descriptions are fined-tuned on synthetic images to determine the best explanation.

Experiment Results

Qualitative Evaluations

We qualitatively analyze results of randomly selected neurons from various layers of ResNet-50, ResNet-18, and ViT-B-16. Labels for each method are color coded by whether we believe they are accurate, somewhat correct, or vague/imprecise. Compared to baseline models, we observe that DnD captures higher level concepts in a more semantically coherent manner.

Below, we show examples from Layers 1 through 4 of ResNet-50 and present additional results in Neuron Examples.

Example ResNet-50 DnD descriptions. We observe that DnD outperforms baseline models.

Quantitative Evaluations

1. Final Layer Evaluation

We follow CLIP-Dissect [1] to quantitatively analyze description quality on the last layer neurons, which have known ground truth labels (i.e. class name) to allow us to evaluate the quality of neuron descriptions automatically. Our results show that DnD outperforms MILAN [2], having a greater average CLIP cosine similarity by 0.0518, a greater average mpnet cosine similarity by 0.18, and a greater average BERTScore by 0.008.

Textual similarity between DnD/MILAN labels and ground truths on ResNet-50 (Imagenet). We can see DnD outperforms MILAN.

2. MILANNOTATIONS

We also performed quantitative evaluation by calculating the textual similarity between a method's label and the corresponding MILANNOTATIONS. Our analysis found that if every neuron is described with the same constant concept: 'depictions', it will achieve better results than any explanation method on this dataset, but this is not a useful nor meaningful description. Thus, the dataset is unreliable to serve as ground truths and can't be relied on for comparing different methods.

Textual similarity between descriptions produced by methods and MILANNOTATIONS. Simply labeling every neuron as ”depictions” outperforms all other methods, demonstrating the unreliability of MILANNOTATIONS as an evaluation method.

3. Crowdsourced Experiment

Our experiment compares the quality of labels produced by DnD against 3 baselines: CLIP-Dissect, MILAN, and Network Dissection [3]. For both models we evaluated 4 of the intermediate layers (end of each residual block), with 200 randomly chosen neurons per layer for ResNet50 and 50 per layer for ResNet-18. Each neurons description is evaluated by 3 different workers. We outline specifics of the experiment below:

Workers are presented with the top 10 highest activating images of a neuron.
Four separate descriptions are given, each corresponding to a label produced by one of the four methods compared.
Workers select the description that best represents the 10 highly activating images presented.
Descriptions are rated on a 1-5 scale. A rating of 1 represents that the user "strongly disagrees" with the given description, and a rating of 5 represents that the user "strongly agrees" with the given description.

Our results show that DnD performs over 2× better than all baseline methods when dissecting ResNet-50 and over 3× better when dissecting ResNet18, being selected the best of the three an impressive 63.21% of the time.

Results for individual layers of ResNet-50. We observe that DnD is the best method across all layers in ResNet-50.

Results for individual layers of ResNet-18. DnD performs significatly better across every layer in ResNet-18 when compared to the baseline methods

4. Use Case

To showcase a potential use case for neuron descriptions, we experimented with using neuron descriptions to find a good classifier for a class missing from the training set. We use neurons from Layer 4 of ResNet-50 (Imagenet) to find neurons in this layer that could serve as the best classifiers for an unseen class, specifically the classes in CIFAR-10 and CIFAR-100 datasets. Our setup is as follows:

Explain all neurons in Layer 4 of ResNet-50 (ImageNet) using different methods.
Find the neuron whose description is closest to the CIFAR class name in a text embedding space (ensemble of the CLIP ViT-B/16 and mpnet text encoders)
Measure the average activation to determine how well the neuron performs as a single class class classifier on the CIFAR validation dataset, measured by area under ROC curve.

The average classification AUC on out of distribution dataset when using neurons with similar description as a classifier. We can see that our DnD clearly outperforms MILAN, the only other generative description method.

5. Ablation Studies

For further insight, we conduct comprehensive ablation studies analyzing the importance of each step in the DnD pipeline. We perform the following experiments with additional details shown here:

Attention Cropping Ablation (Step 1)
Image Captioning with Fixed Concept Sets (Step 2)
Image-to-Text Model Ablation (Step 2)
Effects of GPT Summarization (Step 2)
Effects of Concept Selection (Step 3)

Related Works

[1] Oikarinen and Weng, "Clip-dissect: Automatic description of neuron representations in deep vision networks." International Conference on Learning Representations, 2023.

[2] Hernandez etal, "Natural language descriptions of deep visual features." International Conference on Learning Representations, 2022.

[3] Bau etal, "Network dissection: Quantifying interpretability of deep visual representations." Computer Vision and Pattern Recognition, 2017.

[4] Kalibhat etal, "Identifying Interpretable Subspaces in Image Representations." International Conference on Machine Learning, 2023.

Cite this work

N. Bai*, R. Iyer*, T. Oikarinen, A. Kulkarni, and T.-W. Weng, Interpreting Neurons in Deep Vision Networks with Language Models , TMLR 2025.

            
    @article{bai2025interpret,
        title={Interpreting Neurons in Deep Vision Networks with Language Models},
        author={Bai, Nicholas and Iyer, Rahul A and Oikarinen, Tuomas and Kulkarni, Akshay and Weng, Tsui-Wei},
        journal={TMLR},
        year={2025}
    }

This webpage template was recycled from here.

Accessibility