Skip to content

Domain Generalization

Definition

Domain Generalization (DG) trains models to perform well on unseen target domains without requiring any data from those domains during training. In eKYC, this means models that work on new phones, new attack types, new lighting conditions, and new demographics they've never seen.


DG Approaches

Approach How It Works Key Methods
Domain alignment Make feature distributions similar across source domains DANN, CORAL, MMD
Domain-adversarial Train domain discriminator to make features domain-invariant SSDG, MADDG
Meta-learning Simulate domain shift during training (MAML-style) MAML for FAS
Data augmentation Create domain variations through augmentation RandAugment, style transfer, adversarial augmentation
Self-supervised pre-training Learn domain-agnostic representations without labels MAE, contrastive + fine-tune
Ensemble Multiple models trained on different domains Majority voting, model averaging

Why DG Matters for eKYC

Component Domain Shift Problem
Face liveness New phone cameras, new attack instruments, new lighting
Face recognition New demographics, cross-quality (ID vs selfie)
Document OCR New document designs, new fonts, damaged documents
Document forensics New editing tools, new manipulation techniques

Key Takeaways

Summary

  • DG is the central research challenge for production eKYC — lab performance ≠ field performance
  • Self-supervised pre-training + fine-tuning is currently the best approach
  • Diverse training data remains the most impactful practical strategy
  • DG must be evaluated with cross-dataset protocols — intra-dataset results are misleading