Devoured - April 22, 2026
Critical Bits in Neural Networks (6 minute read)

Critical Bits in Neural Networks (6 minute read)

AI Read original

Researchers demonstrate that flipping just 1-2 sign bits in critical neural network parameters can reduce model accuracy from over 75% to near zero across vision and language tasks.

What: Deep Neural Lesion (DNL) is a data-free attack method that identifies highly vulnerable parameters in neural networks where targeted bit flips cause catastrophic failure. The technique requires only write access to stored weights and works across architectures including ResNets, Vision Transformers, YOLO, and large language models like Qwen and Nemotron.
Why it matters: This reveals a fundamental security vulnerability in deployed ML systems that can be exploited through physical attacks like Rowhammer, DMA attacks, or firmware compromises—threat vectors that bypass traditional security measures. The attack requires minimal computation and no training data, making it practical for real-world attackers, while common defenses like quantization and pruning prove ineffective.
Takeaway: If deploying models in adversarial environments, consider implementing selective hardening by protecting the top 0.1-1% most vulnerable parameters, which the research shows provides substantial resilience without major performance overhead.
Deep dive
  • Neural networks exhibit extreme sensitivity to sign-bit flips in specific parameters, where changing just 1-2 bits can reduce accuracy from 76% to 0% in ResNet-50 and collapse reasoning in 30B parameter language models
  • Early-layer parameters have disproportionate impact because corrupted feature maps propagate through the entire network, fundamentally altering all downstream representations
  • The attack is data-free and computation-light, requiring only magnitude-based heuristics to identify critical parameters with zero forward passes (or optionally one pass with random inputs for refinement)
  • Attack surface spans all major architectures: CNNs show 95%+ accuracy drops with 3 flips, Vision Transformers follow similar patterns, and Mixture-of-Experts models amplify damage when targeting different experts
  • Object detection and segmentation systems collapse completely with 1-2 backbone flips, reducing both detection and mask AP to zero in Mask R-CNN and YOLOv8
  • Language models degenerate into repetitive nonsensical text rather than near-miss errors, indicating catastrophic failure modes rather than graceful degradation
  • The vulnerability is realistic under existing threat models: attackers with storage access via firmware exploits, rootkits, DMA attacks, or Rowhammer can execute without training data or significant compute
  • Traditional defenses fail: DNL bypasses weight quantization, pruning, and simple checksumming because it relies only on magnitude-based targeting without needing threat model knowledge
  • Magnitude-based parameter selection combined with early-layer targeting significantly outperforms random flips and matches computationally expensive top-k selection while remaining lightweight
  • Practical defense exists through selective hardening: protecting only 0.1-1% of the most vulnerable weights provides substantial resilience, and defense costs scale better than attack identification for large models
  • The pattern holds universally across domains: same early-layer criticality appears in encoders (BERT, RoBERTa), decoder-only models (Qwen, Nemotron), and vision systems
  • Attribution and detection are exceptionally challenging because the attack leaves minimal forensic traces and requires no unusual computational activity or data access
Decoder
  • Sign-bit flip: Changing the single bit that determines whether a number is positive or negative, instantly negating a weight value without needing to modify magnitude bits
  • DMA attack: Direct Memory Access attack where malicious code bypasses the CPU to directly read or modify memory contents
  • Rowhammer: Hardware vulnerability where repeatedly accessing memory rows causes bit flips in adjacent rows through electrical interference
  • MoE (Mixture-of-Experts): Architecture where different specialized sub-networks (experts) handle different inputs, with a routing mechanism deciding which expert processes each token
  • AP (Average Precision): Standard metric for object detection that measures both detection accuracy and localization quality across confidence thresholds
  • Early layers: Initial network layers that process raw inputs (edge detectors in vision, embedding layers in language), whose outputs feed all subsequent computation
Original article
DNL effect on convolutional kernel

One sign-bit flip in an early-layer edge detector kernel fundamentally alters learned representations. The transformed kernel generates corrupted feature maps that propagate through the network, severely impairing recognition. This single-bit perturbation illustrates the critical vulnerability of early-layer parameters.

Overview

Deep neural networks are vulnerable to catastrophic failure from flipping just a few sign bits in model parameters. We present Deep Neural Lesion (DNL), a data-free method that identifies and exploits critical parameters across vision and language domains.

Our approach requires only write access to stored weights—no training data, no optimization, minimal computation. This makes it practical under realistic threat models where attackers compromise model storage through firmware exploits, rootkits, DMA attacks, or Rowhammer vulnerabilities.

  • ResNet-50: 2 sign flips → 99.8% accuracy drop
  • Mask R-CNN / YOLOv8-seg: 1–2 flips collapse detection and segmentation
  • Qwen3-30B & Nemotron 8B: Few flips reduce reasoning and task accuracy to near-zero

Methodology

Attack Variants

Pass-Free DNL: Identifies critical parameters using magnitude-based heuristics and early-layer targeting with zero additional computation.

Enhanced 1-Pass DNL: Refines parameter selection with a single forward and backward pass on random inputs, achieving stronger attacks with minimal overhead.

Why Sign-Bit Flips Matter

  • Clean disruption: Flipping the sign bit instantly negates weights, maximizing feature map corruption
  • Hardware feasibility: Bit flips in fixed positions are more reliably achievable in physical attacks
  • Early-layer criticality: High-magnitude weights in early layers have outsized impact on all downstream representations
  • Universal vulnerability: Pattern holds across CNNs, Transformers, and MoE architectures

Image Classification Results

Evaluated on 60 classifiers including 48 ImageNet models from timm and Torchvision repositories across diverse architectures.

Model Vulnerability Hierarchy

ResNet-50

2 flips: 99.8% drop (76.1% → 0.0%)

EfficientNet-B7

3 flips: 95%+ drop. Scales worse than ResNet.

Vision Transformer

Early blocks critical. Similar pattern to CNNs.

Model size vs. vulnerability

Accuracy reduction (AR) as a function of model scale across diverse architectures. All models remain highly vulnerable, with early-layer targeting dominating architecture choice in determining susceptibility.

Detection & Segmentation Results

Object detection and instance segmentation systems collapse dramatically with just 1-2 parameter flips in backbone networks.

Clean Mask R-CNN detection

Baseline: Mask R-CNN correctly detects and segments objects with high confidence.

Attacked Mask R-CNN

After 1-2 flips: Detection and segmentation collapse to random outputs.

YOLOv8-seg

1–2 early flips. Detection & segmentation collapse.

Mask R-CNN

1–2 flips in backbone. AP → 0, Mask AP → 0.

Key Finding

Backbone criticality. Head can recover.

Language Models Results

Reasoning and generation models exhibit severe vulnerability to targeted parameter bit flips. Both MoE and dense architectures are affected.

Qwen3-30B-A3B

2 flips (different experts). 78% → 0% reasoning.

Qwen3-4B

14 flips all layers. 100% accuracy reduction.

Nemotron 8B

32 flips first 5 blocks. Complete collapse.

BERT (Text Encoder)

Early layers critical. Encoder vulnerability.

RoBERTa

Early layers still critical. Consistent pattern.

Language Model Attack Patterns

Decoder-only models (Qwen, Nemotron): Sign-bit attacks are highly effective, especially when targeting the first 5 blocks. Two targeted flips can reduce Qwen3-30B accuracy from 78% to 0%.

MoE routing: Targeting different experts in Mixture-of-Experts models amplifies the attack impact, as each token's routing path becomes compromised.

Encoder models (BERT, RoBERTa): Early-layer sign-bit flips remain highly destructive across diverse architectures.

Generation behavior: Attacked models degenerate into repetitive, nonsensical text rather than near-miss errors—indicating catastrophic failure rather than graceful degradation.

Attack Strategy Comparison

Comparison of attack strategies

Performance of different targeting heuristics across 48 ImageNet models. Magnitude-based selection combined with early-layer targeting significantly outperforms random flips and matches top-k magnitude selection while remaining data-free and computationally lightweight.

Defense & Implications

While the vulnerability is severe, we demonstrate that selective hardening of critical parameters provides practical defense. By protecting only the top 0.1-1% most vulnerable weights, models achieve substantial resilience without major performance overhead.

DNL easily bypasses common defenses such as weight quantization, pruning, and simple checksumming schemes. The data-free nature and reliance on magnitude-based targeting make it robust against defenses that assume adversaries lack threat model knowledge or computational resources.

Key Takeaways

  • Critical parameters are universally identifiable across architectures and domains
  • Defense cost scales better than attack identification for large models
  • Once attackers gain parameter write access, minimal computation suffices for catastrophic failure
  • Data-free nature makes detection and attribution exceptionally challenging

Citation

@article{galil2025maximal, title={Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips}, author={Galil, Ido and Kimhi, Moshe and El-Yaniv, Ran}, journal={Transactions on Machine Learning Research}, year={2025}, url={https://arxiv.org/pdf/2502.07408} }