We introduce the robust radius (also called input-space margin) as a powerful confidence metric.
Figure 1: Overview of our approach. In RAT, we train the model to enlarge the robust radius(the minimum L∞ distance to change a prediction) for correct examples and encourage smaller radius for wrong examples. In inference, We measure robust radius as a confidence metric for detecting potential misclassifications.
Modern neural networks often produce high-confidence predictions even when they are wrong. Hence, misclassification detection [1] aims to alert users about uncertain decisions that are likely to be inaccurate. Traditional approaches often rely on the model’s softmax probabilities, but these can be overconfident, hindering reliable detection of errors.
In this work, we propose an alternative perspective by focusing on how “close” the input is to the model's decision boundary. Specifically, we study the robust radius, the smallest adversarial perturbation required to alter the model’s predicted label. Empirically, we find that correct predictions generally have a much larger robust radius compared to incorrectly classified ones. Based on this, we introduce two practical algorithms—RR-BS and RR-Fast—that efficiently estimate robust radius in real time and thus help detect potential misclassifications.
Further, we propose a novel training scheme called Radius Aware Training (RAT) to enhance the model's ability to separate correct and incorrect predictions, all without extra data. Our experiments show that RAT substantially reduces misdetection rates compared to existing methods, achieving up to a 29.3% improvement in AURC and 21.6% reduction in false-positive rate at 95% true-positive rate (FPR@95TPR) across various benchmarks.
Misclassification Detection (MisD): The goal is to distinguish correct predictions from misclassified inputs. A typical approach uses a confidence score function C(x) and sets a threshold τ; inputs with C(x) < τ are flagged as likely misclassified.
Robust Radius: Formally, the robust radius of an input x measures the minimal adversarial perturbation δ (in the L∞ sense) needed to change the model’s prediction: \[ RR = \text{argmin} \quad {\|\delta\|_{\infty}}\quad \text{s.t.} \quad \hat{y}(x + \delta) \neq \hat{y}(x), \] where \(\hat{y}(x)\) is the predicted label of \(x\). Intuitively, if \(x\) is far from the boundary, it enjoys a large robust radius and is less likely to be an erroneous prediction.
Intuitively, an input positioned far from the decision boundary (thus having a large robust radius) is less likely to be erroneously classified compared to an input near the boundary. This motivates us to use robust radius as a confidence metric for misclassification detection. Empirically, we find that correct predictions generally have a much larger robust radius compared to incorrectly classified ones, suggesting that robust radius can be used as a reliable confidence metric for misclassification detection.
Figure 2: Illustration of RAT. Correctly classified points are pushed away from decision boundaries, while misclassified points are allowed to remain near or within proximity of the boundary, creating a margin gap between right and wrong predictions.
Further, we propose two practical algorithms, RR-BS and RR-Fast, to estimate robust radius with high efficiency.
RR-BS is a simple yet effective algorithm that uses binary search to find the minimal adversarial perturbation. It could use any adversarial attack method and in this work we choose FGSM [2], which is simple and fast. The detailed algorithm is shown in the left part of Figure 3.
Figure 3: Algorithms of RR-BS and RR-Fast.
RR-Fast is a faster algorithm that uses linear approximation to estimate robust radius. The key idea is, to avoid the expensive binary search in RR-BS, we approximate the model prediction with a linear function along the attack direction. The detailed algorithm is shown in the right part of Figure 3.
Inspired by the effectiveness of robust radius in misclassification detection, we propose a novel training scheme called Radius Aware Training (RAT) to further enhance misclassification detection. Rather than blindly enlarging the robust radius of all examples (as in standard adversarial training), RAT applies different adversarial objectives depending on whether an example is predicted correctly or not: \[L_{\text{RAT}}(\Theta; x, y) = \begin{cases} \min_{\|x^\prime - x_0\|_\infty \leq \epsilon} l(\Theta; x^\prime) & \text{if } \hat{y}(x) \neq y;\\ \max_{\|x^\prime - x_0\|_\infty \leq \epsilon} l(\Theta; x^\prime) & \text{if } \hat{y}(x) = y. \end{cases} \] The goal of RAT is to enlarge radius for correct examples and encourage smaller radius for wrong examples.
Figure 4: Illustration of RAT. Correctly classified points are pushed away from decision boundaries, while misclassified points are allowed to remain near or within proximity of the boundary, creating a margin gap between right and wrong predictions.
We evaluate our approach on CIFAR10, CIFAR100, and ImageNet datasets across various network architectures (e.g. ResNet, WideResNet). Below are key highlights:
Table 1: Example comparison of MisD metrics (AURC, FPR@95, and AUROC) on CIFAR10. RAT+RR-BS consistently outperforms baselines across multiple benchmarks.
Table 2: Example comparison of MisD metrics (AURC, FPR@95, and AUROC) on ImageNet. RAT+RR-BS consistently outperforms baselines across multiple benchmarks.
We presented a simple but effective perspective on misclassification detection by considering the robust radius—the minimal adversarial perturbation that changes model predictions. We proposed RR-BS and RR-Fast to estimate robust radius efficiently, and a new training scheme called RAT to increase the separability between correct and misclassified examples, all without extra data. We hope our findings inspire future work on bridging adversarial robustness and confidence estimation, enabling safer real-world deployment of deep learning models.
Ge Yan and Tsui-Wei Weng. "RAT: Boosting Misclassification Detection Ability without Extra Data." arXiv preprint arXiv:2503.14783, 2025.
@article{yan2025rat, title={RAT: Boosting Misclassification Detection Ability without Extra Data}, author={Yan, Ge and Weng, Tsui-Wei}, journal={arXiv preprint arXiv:2503.14783}, year={2025} }