Skip to content

Model Drift & Retraining

Definition

Model drift occurs when a production model's performance degrades over time due to changes in input data distribution, new attack types, or evolving user demographics.


Types of Drift

Type Cause Example in eKYC
Data drift Input distribution changes New phone models with different cameras
Concept drift Relationship between input and label changes New attack type not in training data
Label drift Class distribution changes Spoof ratio increases due to new tool availability
Upstream drift Changes in preprocessing SDK update changes image quality/format

Detection Methods

Method How It Works
Score distribution monitoring Track model output distribution — shift indicates drift
Performance monitoring Track accuracy metrics (requires delayed ground truth)
Population Stability Index (PSI) Compare current vs baseline feature distributions
Statistical tests KS test, chi-squared test on feature/score distributions

Retraining Strategies

Strategy When to Use
Scheduled Retrain monthly/quarterly regardless
Trigger-based Retrain when drift metric exceeds threshold
Continuous Ongoing training on new data
Incremental Fine-tune on new data only

Key Takeaways

Summary

  • Model drift is inevitable in eKYC — new phones, new attacks, new demographics
  • Score distribution monitoring is the fastest drift indicator (no labels needed)
  • Trigger-based retraining balances freshness with cost
  • Always validate retrained model on held-out test sets before deployment