Math operations and closed-book data retrieval shared paths with memorization, dropping from 66 to 86 percent of performance after editing. The researchers found that arithmetic was particularly fragile. Even when the models generated identical chains of reasoning, they failed in the calculation step after low-curvature components were removed.
“Arithmetic problems themselves are memorized on the 7B scale, or because they require limited-use instructions to perform accurate calculations,” the team explains. Open-book question answering, which relies on provided context rather than internal knowledge, proved to be more robust to the editing procedure, maintaining near-complete performance.
Interestingly, the separation mechanism varied depending on the type of information. Common facts, such as country capitals, barely changed after the edit, while rare facts, such as company CEOs, fell 78 percent. This suggests that the models allocate different neural resources depending on the frequency with which information appears in training.
The K-FAC technique outperformed existing memorization deletion methods without requiring training examples of memorized content. Based on unseen historical quotes, K-FAC achieved 16.1 percent recall versus 60 percent for the previous best method, BalancedSubnet.
Vision transformers showed similar patterns. When trained with intentionally mislabeled images, the models developed different pathways for memorizing incorrect labels versus learning correct patterns. Removing the memorization pathways restored 66.5 percent accuracy on previously mislabeled images.
Memory removal limits
However, the researchers acknowledged that their technique is not perfect. Memories once removed could return if the model receives more training, such as other investigations has shown that current unlearning methods only suppress information rather than completely erasing it from the neural network’s weights. That means “forgotten” content can be reactivated with just a few training steps targeting those suppressed areas.
Researchers also can’t fully explain why some skills, like math, break down so easily when memorization is removed. It is not clear whether the model actually memorized all of its arithmetic or whether mathematics simply uses similar neural circuits for memorization. Additionally, some sophisticated abilities may appear to be memorization of your detection method, even when they are actually complex reasoning patterns. Finally, the mathematical tools they use to measure the “landscape” of the model can become unreliable at the extremes, although this does not affect the actual editing process.
This article was updated on November 11, 2025 at 9:16 am to clarify an explanation regarding the classification of weights by curvature.
#Researchers #isolate #memorization #problem #solving #neural #networks