GERNE: Gradient Extrapolation for Debiased Representation Learning

Abstract

Machine learning classification models trained with empirical risk minimization (ERM) often inadvertently rely on spurious correlations. When absent in the test data, these unintended associations between non-target attributes and target labels lead to poor generalization. This paper addresses this problem from a model optimization perspective and proposes a novel method, Gradient Extrapolation for Debiased Representation Learning (GERNE), designed to learn debiased representations in both known and unknown attribute training cases. GERNE uses two distinct batches with different amounts of spurious correlations and defines the target gradient as a linear extrapolation of the gradients computed from each batch's loss. Our analysis shows that when the extrapolated gradient points toward the batch gradient with fewer spurious correlations, it effectively guides training toward learning a debiased model. GERNE serves as a general framework for debiasing, encompassing methods such as ERM, reweighting, and resampling, as special cases. We derive the theoretical upper and lower bounds of the extrapolation factor employed by GERNE. By tuning this factor, GERNE can adapt to maximize either Group-Balanced Accuracy (GBA) or Worst-Group Accuracy (WGA). We validate the proposed approach on five vision and one NLP benchmarks, demonstrating competitive and often superior performance compared to state-of-the-art baselines.

Method

(a) Sample images from the Waterbirds classification task. Most landbird images appear with land backgrounds (i.e., y=1, a=1), while most waterbird images appear with water backgrounds (i.e., y=2, a=2). This correlation between bird class and background introduces spurious correlations in the dataset.
(b) Visualization of batch construction. B_b shows a biased batch where the majority of images from class y=1 (top row) have attribute a=1 (yellow), and most images from class y=2 (bottom row) have attribute a=2 (light-blue). B_lb represents a less biased batch, with a more balanced attribute distribution within each class, controlled by c (here c=1/2). B_rs depicts a group-balanced distribution and refers to a batch sampled using the Resampling method. B_ext simulates GERNE's batch with c · (β + 1) > 1, where the dataset’s minority group appears as the majority in the batch.
(c) A simplified 2D representation of gradient extrapolation where θ ∈ ℝ². ∇_θℒ_b is the gradient computed on B_b; training with this gradient is equivalent to training with the ERM objective. ∇_θℒ_lb is computed on B_lb. ∇_θℒ_rs is the gradient computed on B_rs, which is equivalent to an extrapolated gradient with c · (β + 1) = 1. Finally, ∇_θℒ_ext is our extrapolated gradient, with the extrapolation factor β modulating the degree of debiasing in conjunction with the strength of spurious correlations in the dataset.

Ablation Study

The impact of tuning β in debiasing the model (β ∈ {−1, 0, 1, 1.2}) is shown. On the left column, we plot the training losses ℒ_b, ℒ_lb, and the target loss ℒ_ext. On the right column, we plot the average accuracy of the minority and majority groups in both training and validation, as well as the average accuracy of the unbiased test set. Each plot represents the mean and standard deviation calculated over three runs with different random seeds.

Acknowledgments

This work was funded by the Carl Zeiss Foundation within the project Sensorized Surgery, Germany (P2022-06-004). Maha Shadaydeh is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)-Individual Research Grant SH 1682/1-1.

We thank Tim Büchner, Niklas Penzel, and Jan Blunk for their manuscript feedback and advice throughout the project.

BibTeX (arXiv Preprint)

@misc{asaad2025gradientextrapolationdebiasedrepresentation,
      title={Gradient Extrapolation for Debiased Representation Learning}, 
      author={Ihab Asaad and Maha Shadaydeh and Joachim Denzler},
      year={2025},
      eprint={2503.13236},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.13236}, 
}

GERNE Gradient Extrapolation for Debiased Representation Learning

Abstract

Method

Ablation Study

Acknowledgments

BibTeX (arXiv Preprint)

GERNE
Gradient Extrapolation for Debiased Representation Learning