Skip to content

eKYC Encyclopedia

Active Learning

Active Learning¶

Definition¶

Active learning selects the most informative unlabeled examples for human annotation — maximizing model improvement per labeled sample. Critical for eKYC where labeling is expensive (especially liveness and forensics).

Query Strategies¶

Strategy	How It Works	Best For
Uncertainty sampling	Select samples where model is least confident	General — simplest approach
Query-by-committee	Multiple models disagree on prediction	Ensemble models
Expected model change	Select samples that would most change model weights	Maximum learning per sample
Diversity sampling	Select diverse samples covering feature space	Avoiding redundant labels
Core-set	Select samples that best represent unlabeled data distribution	Large unlabeled pools

Active Learning Loop for eKYC¶

graph TD
    A[Unlabeled eKYC Data<br/>1M+ images] --> B[Model Predicts<br/>Score each sample]
    B --> C[Query Strategy<br/>Select most informative 1000]
    C --> D[Human Annotation<br/>Expert labels selected samples]
    D --> E[Retrain Model<br/>Updated with new labels]
    E --> F{Performance sufficient?}
    F -->|No| B
    F -->|Yes| G[Deploy Model]

Key Takeaways¶

Summary

Active learning reduces labeling cost by 3-10x — label only what matters most
Uncertainty sampling is the simplest and often most effective strategy
Especially valuable for liveness (expert annotation needed) and forensics (subtle ground truth)
Combine with self-supervised pre-training for maximum data efficiency