RAT: Boosting Misclassification Detection Ability without Extra Data

Abstract

We introduce the robust radius (also called input-space margin) as a powerful confidence metric.

We propose two efficient estimation algorithms, RR-BS and RR-Fast, to measure robust radius in practice.

We further design a novel training framework called Radius Aware Training (RAT) to enhance the model's ability to separate correct and incorrect predictions, all without needing extra data.

Our experiments show that RAT substantially reduces misdetection rates compared to existing methods, achieving up to a 29.3% improvement in AURC and 21.6% reduction in false-positive rate at 95% true-positive rate (FPR@95TPR) across various benchmarks.

Figure 1: Overview of our approach. In RAT, we train the model to enlarge the robust radius(the minimum L_∞ distance to change a prediction) for correct examples and encourage smaller radius for wrong examples. In inference, We measure robust radius as a confidence metric for detecting potential misclassifications.

Introduction

Modern neural networks often produce high-confidence predictions even when they are wrong. Hence, misclassification detection [1] aims to alert users about uncertain decisions that are likely to be inaccurate. Traditional approaches often rely on the model’s softmax probabilities, but these can be overconfident, hindering reliable detection of errors.

In this work, we propose an alternative perspective by focusing on how “close” the input is to the model's decision boundary. Specifically, we study the robust radius, the smallest adversarial perturbation required to alter the model’s predicted label. Empirically, we find that correct predictions generally have a much larger robust radius compared to incorrectly classified ones. Based on this, we introduce two practical algorithms—RR-BS and RR-Fast—that efficiently estimate robust radius in real time and thus help detect potential misclassifications.

Further, we propose a novel training scheme called Radius Aware Training (RAT) to enhance the model's ability to separate correct and incorrect predictions, all without extra data. Our experiments show that RAT substantially reduces misdetection rates compared to existing methods, achieving up to a 29.3% improvement in AURC and 21.6% reduction in false-positive rate at 95% true-positive rate (FPR@95TPR) across various benchmarks.

Background

Misclassification Detection (MisD): The goal is to distinguish correct predictions from misclassified inputs. A typical approach uses a confidence score function C(x) and sets a threshold τ; inputs with C(x) < τ are flagged as likely misclassified.

Robust Radius: Formally, the robust radius of an input x measures the minimal adversarial perturbation δ (in the L_∞ sense) needed to change the model’s prediction: \[ RR = \text{argmin} \quad {\|\delta\|_{\infty}}\quad \text{s.t.} \quad \hat{y}(x + \delta) \neq \hat{y}(x), \] where \(\hat{y}(x)\) is the predicted label of \(x\). Intuitively, if \(x\) is far from the boundary, it enjoys a large robust radius and is less likely to be an erroneous prediction.

Robust Radius and MisD

Intuitively, an input positioned far from the decision boundary (thus having a large robust radius) is less likely to be erroneously classified compared to an input near the boundary. This motivates us to use robust radius as a confidence metric for misclassification detection. Empirically, we find that correct predictions generally have a much larger robust radius compared to incorrectly classified ones, suggesting that robust radius can be used as a reliable confidence metric for misclassification detection.

Figure 2: Illustration of RAT. Correctly classified points are pushed away from decision boundaries, while misclassified points are allowed to remain near or within proximity of the boundary, creating a margin gap between right and wrong predictions.

Further, we propose two practical algorithms, RR-BS and RR-Fast, to estimate robust radius with high efficiency.

RR-BS is a simple yet effective algorithm that uses binary search to find the minimal adversarial perturbation. It could use any adversarial attack method and in this work we choose FGSM [2], which is simple and fast. The detailed algorithm is shown in the left part of Figure 3.

Figure 3: Algorithms of RR-BS and RR-Fast.

RR-Fast is a faster algorithm that uses linear approximation to estimate robust radius. The key idea is, to avoid the expensive binary search in RR-BS, we approximate the model prediction with a linear function along the attack direction. The detailed algorithm is shown in the right part of Figure 3.

Radius Aware Training (RAT)

Inspired by the effectiveness of robust radius in misclassification detection, we propose a novel training scheme called Radius Aware Training (RAT) to further enhance misclassification detection. Rather than blindly enlarging the robust radius of all examples (as in standard adversarial training), RAT applies different adversarial objectives depending on whether an example is predicted correctly or not: \[L_{\text{RAT}}(\Theta; x, y) = \begin{cases} \min_{\|x^\prime - x_0\|_\infty \leq \epsilon} l(\Theta; x^\prime) & \text{if } \hat{y}(x) \neq y;\\ \max_{\|x^\prime - x_0\|_\infty \leq \epsilon} l(\Theta; x^\prime) & \text{if } \hat{y}(x) = y. \end{cases} \] The goal of RAT is to enlarge radius for correct examples and encourage smaller radius for wrong examples.

Figure 4: Illustration of RAT. Correctly classified points are pushed away from decision boundaries, while misclassified points are allowed to remain near or within proximity of the boundary, creating a margin gap between right and wrong predictions.

Experiments

We evaluate our approach on CIFAR10, CIFAR100, and ImageNet datasets across various network architectures (e.g. ResNet, WideResNet). Below are key highlights:

MisD Performance boost: Using robust radius as confidence score with our RAT model, out method outperforms traditional softmax-based metrics (e.g. MSR [1], ODIN [3]) in terms of AURC and FPR@95 and achieves competitive results compared to OpenMix [4], while requiring no extra data.
Fewer Hyperparameters: Our robust radius approach is less sensitive to manual tuning compared to methods that require carefully adjusting temperature scaling or noise magnitudes.
RR-BS and RR-Fast are efficient: Our RR-BS and RR-Fast algorithms are efficient and can be used in real-time applications.

Table 1: Example comparison of MisD metrics (AURC, FPR@95, and AUROC) on CIFAR10. RAT+RR-BS consistently outperforms baselines across multiple benchmarks.

Table 2: Example comparison of MisD metrics (AURC, FPR@95, and AUROC) on ImageNet. RAT+RR-BS consistently outperforms baselines across multiple benchmarks.

Conclusion

We presented a simple but effective perspective on misclassification detection by considering the robust radius—the minimal adversarial perturbation that changes model predictions. We proposed RR-BS and RR-Fast to estimate robust radius efficiently, and a new training scheme called RAT to increase the separability between correct and misclassified examples, all without extra data. We hope our findings inspire future work on bridging adversarial robustness and confidence estimation, enabling safer real-world deployment of deep learning models.

Related Works

[1] Hendrycks, Dan, and Kevin Gimpel. "A baseline for detecting misclassified and out-of-distribution examples in neural networks." ICLR 2017
[2] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
[3] Liang, Shiyu, Yixuan Li, and R. Srikant. "Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks." ICLR 2018.
[4] Fei Zhu, Zhen Cheng, Xu-Yao Zhang, and Cheng-Lin Liu. "Openmix: Exploring outlier samples for misclassification detection." CVPR 2023.

Cite this work

Ge Yan and Tsui-Wei Weng. "RAT: Boosting Misclassification Detection Ability without Extra Data." arXiv preprint arXiv:2503.14783, 2025.

@article{yan2025rat,
  title={RAT: Boosting Misclassification Detection Ability without Extra Data},
  author={Yan, Ge and Weng, Tsui-Wei},
  journal={arXiv preprint arXiv:2503.14783},
  year={2025}
}