SELECTIVE NEURON LEVEL KNOWLEDGE UNLEARNING IN PLMS

Authors

  • Hanzla Farooq
  • Hammad Majeed
  • Ahmad Raza

Abstract

Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks, from text generation to classification. However, their ability to memorize sensitive information during training introduces significant privacy risks, particularly when such data is exposed or misused. Existing approaches such as retraining models from scratch without sensitive data, are computationally expensive and impractical for real-world scenarios. Moreover, current unlearning methods, including gradient-based and shard-based approaches, are limited by high computational costs, performance degradation, and the inability to remove sensitive knowledge. To address these challenges, this study proposes the Selective Neuron-Level Knowledge Unlearning (SNKU) framework, which targets specific neurons responsible for encoding sensitive knowledge within Pre-trained Language Models (PLMs). By employing Correlation Clustering to identify interdependent neuron groups and utilizing Gradient x Activation to rank neuron importance, SNKU selectively edits high-impact neurons. This targeted approach diminishes the influence of sensitive data on model predictions while preserving overall performance, offering an efficient alternative to traditional, more computationally intensive unlearning methods.

Downloads

Published

2026-02-20

How to Cite

Hanzla Farooq, Hammad Majeed, & Ahmad Raza. (2026). SELECTIVE NEURON LEVEL KNOWLEDGE UNLEARNING IN PLMS . Spectrum of Engineering Sciences, 4(2), 354–368. Retrieved from https://thesesjournal.com/index.php/1/article/view/2053