A set of recent research papers proposes that freezing or selectively tuning a small fraction of neurons inside large language models can, in reported benchmark evaluations, reduce unsafe outputs without retraining billions of parameters. Across multiple papers exploring neuron-level safety interventions, the work departs from conventional approaches that rely on broad post-training fine-tuning. But a […]