Recent studies show that LLMs with chain-of-thought (CoT) reasoning achieve impressive problem-solving. However, they sometimes produce overly short reasoning, leading to lower accuracy on even simple problems. We identify that reasoning length is encoded as a linear direction in the hidden space, and propose ThinkEdit—a lightweight weight-editing method that suppresses short reasoning by modifying only 0.1% of model parameters. By targeting a small subset of attention heads (~2%), ThinkEdit improves accuracy on short-reasoning cases (+5.44%) and enhances overall performance (+2.43%) across math benchmarks. Our work offers new insights into controlling reasoning behavior inside LLMs.
Figure 1: The overview of ThinkEdit.
We observe a consistent issue across Deepseek-distilled reasoning models: significantly lower accuracy when the reasoning length is short. This pattern holds across datasets such as GSM8K and MATH-Level5. As shown in the following figure, cumulative accuracy drops sharply for responses with reasoning length below 2000 tokens, contrary to the intuition that shorter reasoning should correspond to easier problems. Instead of solving simple tasks efficiently, models often fail when generating overly brief chains of thought.
Figure 2: The performance of all deepseek-distilled reasoning models degrade significantly when the reasoning length is too short. The x-axis represents the cutoff threshold on reasoning length, and the y-axis shows the corresponding cumulative accuracy.
Building on our discovery of the reasoning-length direction, we propose ThinkEdit that removes the short reasoninging direction in attention heads as follows:
Experiments demonstrate that ThinkEdit effectively mitigates overly short reasoning in reasoning models, boosting short reasoning accuracy by up to +5.44% and improving overall performance by +2.43% across multiple math benchmarks.
Figure 5: Heatmap illustrating the short reasoning contribution for each attention head. Heads with higher values (in red) show stronger alignment with short reasoning behavior. The short-reasoning heads are sparse in reasoning models.
Table 1: Overall accuracy (%) of each model before and after applying ThinkEdit. ThinkEdit improves the reasoning models across all benchmarks.
Table 2: Accuracy (%) of the top 5% / 10% / 20% shortest reasoning responses.ThinkEdit significantly improves the correctness when the reasoning is short.
Table 3: Average reasoning length for the top 5% / 10% / 20% shortest responses (in tokens). ThinkEdit slighty increase the reasoning length when the reasoning is overly short.
ThinkEdit demonstrates that a small-scale weight-editing can correct overly short reasoning in LLMs, leading to substantial improvements in accuracy. This work provides new mechanistic insights into reasoning length control and opens up avenues for further fine-grained model interventions.
Chung-En Sun, Ge Yan, Tsui-Wei Weng. "ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models", arXiv Preprint 2025.
@article{thinkedit,
title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
author={Sun, Chung-En and Yan, Ge and Weng, Tsui-Wei},
journal={arXiv preprint},
year={2025},
url={https://github.com/Trustworthy-ML-Lab/ThinkEdit}
}