Skip to content

Face Recognition Overview

Definition

Face recognition in eKYC is the process of generating a mathematical representation (embedding) of a face and comparing it against another face to determine if they belong to the same person. It is used to match a live selfie against the photo on an identity document.


The Face Recognition Pipeline

graph LR
    A[Face Image] --> B[Face Detection<br/>SCRFD]
    B --> C[Alignment<br/>Affine transform]
    C --> D[Feature Extraction<br/>Backbone CNN/ViT]
    D --> E[Embedding<br/>512-d vector]
    E --> F[Comparison<br/>Cosine similarity]

    G[ID Photo] --> H[Same pipeline]
    H --> I[Embedding<br/>512-d vector]
    I --> F

    F --> J{Similarity > threshold?}
    J -->|Yes| K[✅ Same person]
    J -->|No| L[❌ Different person]

    style E fill:#6A1B9A,color:#fff
    style I fill:#6A1B9A,color:#fff
    style K fill:#2E7D32,color:#fff

Face Embeddings

A face embedding is a dense vector (typically 512 dimensions) that encodes the identity-relevant features of a face:

Property Details
Dimensionality 512 (standard), 128 or 256 (lightweight)
Content Encodes facial geometry, texture patterns, and identity features
Same person Embeddings are close together (high cosine similarity)
Different person Embeddings are far apart (low cosine similarity)
Invariant to Lighting, expression, minor pose changes (within training distribution)
Not invariant to Extreme pose (profile), heavy occlusion, dramatic age change

Cosine Similarity

The standard metric for comparing embeddings:

$$\text{similarity} = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}| \cdot |\mathbf{b}|}$$

Score Interpretation
> 0.80 Very high confidence — same person
0.65 - 0.80 High confidence — likely same person
0.50 - 0.65 Moderate — possible match, needs review
< 0.50 Low confidence — likely different person
< 0.30 Very low — almost certainly different

Training Face Recognition Models

Loss Functions

The key innovation in modern face recognition is the angular margin loss:

Loss Function Paper/Year Key Innovation
Softmax Baseline Standard classification — poor discriminability
Center Loss 2016 Minimize intra-class distance
SphereFace 2017 Angular margin in weight space
CosFace 2018 Additive cosine margin — stable training
ArcFace 2019 Additive angular margin — best discriminability
AdaFace 2022 Quality-adaptive margin — handles quality variation
ElasticFace 2022 Elastic margin for flexible class boundaries

ArcFace Loss (Most Widely Used)

L = -log(exp(s * cos(θ_yi + m)) / (exp(s * cos(θ_yi + m)) + Σ exp(s * cos(θ_j))))
Parameter Typical Value Purpose
s (scale) 64 Controls temperature of softmax
m (margin) 0.5 Angular margin penalty for same-class features

Training Data

Dataset Size Identities Use
MS1MV2 (MS-Celeb-1M cleaned) 5.8M images 85K IDs Standard training set
Glint360K 17M images 360K IDs Largest public training set
WebFace260M 260M images 4M IDs Massive web-crawled (noisy)
VGGFace2 3.3M images 9.1K IDs Diverse pose/age
CASIA-WebFace 500K images 10.5K IDs Smaller, clean

Backbone Architectures

Architecture Params GFLOPs LFW Accuracy Use Case
ResNet-100 (R100) 65M 24 99.80%+ Server — standard
ResNet-50 (R50) 44M 12 99.78% Server — efficient
ResNet-18 (R18) 28M 4 99.60% Edge/mobile
MobileNetV3 5M 0.5 99.50% Mobile — lightweight
ViT-S 22M 4.6 99.80% Emerging — transformer-based
EdgeNeXt 5M 1.0 99.55% Mobile — efficient

1:1 Verification vs 1:N Identification

Mode eKYC Use Process Speed
1:1 Verification Match selfie to ID photo Compare 2 embeddings < 1ms
1:N Identification Deduplication — find if face exists in database Compare 1 embedding against N Depends on N (ANN search)

1:N Search Technologies

For large-scale deduplication (millions of faces):

Technology Approach Speed (1M database)
FAISS (Facebook) Approximate nearest neighbor with GPU < 10ms
Milvus Vector database with indexing < 20ms
Annoy (Spotify) Random projection trees < 50ms
ScaNN (Google) Anisotropic vector quantization < 10ms

Face Recognition Challenges in eKYC

Challenge Impact Mitigation
Cross-age ID photo 5-15 years old Age-invariant models, lower thresholds
Cross-quality ID photo vs HD selfie AdaFace, quality-aware matching
Twins Identical twins have very similar embeddings Liveness + document data must also match
Cosmetic changes Surgery, weight change, facial hair Robust models trained on diverse variations
Cross-ethnicity Performance variation across demographics Balanced training data, per-demographic thresholds
Extreme expressions Distorts facial geometry Expression-invariant training

Key Takeaways

Summary

  • Face recognition converts faces to 512-d embeddings and compares via cosine similarity
  • ArcFace is the most widely used loss function — additive angular margin for discriminability
  • ResNet-100 is the standard server backbone; MobileNetV3 for mobile
  • 1:1 verification (selfie vs ID) is the primary eKYC use case — sub-millisecond comparison
  • 1:N deduplication uses ANN search (FAISS) for large-scale face database matching
  • Key challenges: cross-age, cross-quality, twins, demographic bias
  • AdaFace specifically addresses the quality mismatch between ID photos and selfies