Skip to content

Face Detection

Definition

Face detection is the task of locating and extracting human faces from an image or video frame. In eKYC, it is the first step in both the face recognition and liveness detection pipelines — everything downstream depends on accurate, fast face detection.


What Face Detection Outputs

graph LR
    A[Input Image] --> B[Face Detector]
    B --> C[Bounding Box<br/>x, y, w, h]
    B --> D[Confidence Score<br/>0.0 - 1.0]
    B --> E[Facial Landmarks<br/>5 or 68+ points]
    B --> F[Face Count]

    style B fill:#4051B5,color:#fff
Output Details Use in eKYC
Bounding box Rectangle coordinates around each face Crop face for recognition/liveness
Confidence score Probability that detection is a face (0-1) Filter false detections
Landmarks (5-point) Left eye, right eye, nose, left mouth, right mouth Face alignment for recognition
Landmarks (68/106-point) Detailed facial structure Quality assessment, 3D pose estimation
Face count Number of faces detected Ensure exactly 1 face for selfie, detect face on document

Modern Face Detection Architectures

SCRFD (Sample and Computation Redistribution for Face Detection)

Aspect Details
Paper SCRFD: Sample and Computation Redistribution for Efficient Face Detection (2021)
Architecture Single-stage anchor-based detector with efficient backbone
Key innovation Redistributes computation across scales and samples for optimal efficiency
Accuracy WIDER Face: 93.78% (Easy), 92.16% (Medium), 77.87% (Hard)
Speed ~2ms on GPU (SCRFD-10GF), ~10ms on mobile
Landmarks 5 keypoints
Why it's popular for eKYC Best accuracy-speed tradeoff, multiple model sizes

SCRFD model variants:

Model GFLOPs WIDER Face Easy Speed (GPU) Use Case
SCRFD-500M 0.5 90.57% ~1ms Mobile/edge, real-time
SCRFD-2.5G 2.5 93.78% ~2ms Server, balanced
SCRFD-10G 10 95.16% ~4ms Server, highest accuracy
SCRFD-34G 34 96.06% ~12ms Offline batch processing

RetinaFace

Aspect Details
Paper RetinaFace: Single-shot Multi-level Face Localisation in the Wild (2020)
Architecture Single-stage with FPN + multi-task (box, landmarks, 3D face)
Key innovation Joint extra-supervised learning with mesh decoder for 3D face
Accuracy WIDER Face: 96.3% (Easy) with ResNet-152 backbone
Landmarks 5 keypoints (lightweight) or dense landmarks
Why it's popular Proven, widely used, excellent landmark accuracy

BlazeFace

Aspect Details
Origin Google MediaPipe (2019)
Architecture Lightweight SSD-based, optimized for mobile
Speed ~1ms on mobile GPU
Accuracy Good for frontal faces, less robust for extreme angles
Use case Real-time mobile face tracking, camera preview guidance

Comparison

Detector Accuracy Speed (GPU) Speed (Mobile) Landmarks Best For
SCRFD-2.5G 93.8% 2ms 10-30ms 5-point eKYC server pipeline
SCRFD-500M 90.6% 1ms 5-15ms 5-point eKYC mobile SDK
RetinaFace-R50 94.9% 5ms 50-100ms 5-point High-accuracy server
BlazeFace ~90% 0.5ms 1-3ms 6-point Real-time camera preview
MTCNN ~91% 15ms 100-300ms 5-point Legacy systems
YOLO-Face ~92% 3ms 20-50ms 5-point General purpose

Face Detection in the eKYC Pipeline

Selfie Face Detection

graph TD
    A[Selfie Image] --> B[Face Detection]
    B --> C{How many faces?}
    C -->|0 faces| D[❌ Reject: No face detected]
    C -->|1 face| E[✅ Continue pipeline]
    C -->|2+ faces| F[❌ Reject: Multiple faces]

    E --> G{Face size check}
    G -->|Too small < 80px| H[❌ Reject: Face too far]
    G -->|Too large > 90% frame| I[❌ Reject: Face too close]
    G -->|Appropriate size| J[✅ Proceed to alignment]

    J --> K{Landmark quality}
    K -->|All 5 landmarks detected| L[✅ Face alignment]
    K -->|Landmarks missing/poor| M[⚠️ Quality warning]

    style E fill:#2E7D32,color:#fff
    style D fill:#e53935,color:#fff

Document Face Detection

Challenge Details Solution
Small face Face on ID card is often only 100-200px Use high-res capture, specialized detector
Print artifacts Printed photo has halftone dots, moiré Preprocessing to reduce print noise
Occlusion Hologram, lamination reflection overlaps face Quality check + multiple capture angles
Old/damaged photo Faded, scratched, low contrast Image enhancement before detection
Multiple faces Group photo on some documents Select face matching the document's face region

Implementation Details

Preprocessing for Detection

Step Purpose Implementation
Resize Normalize input to expected resolution Longest side to 640px (maintaining aspect ratio)
Color normalization Handle different lighting Convert to RGB, normalize to [0,1] or [-1,1]
Padding Handle non-square images Pad to square with border value

Post-Processing

Step Purpose Details
NMS (Non-Maximum Suppression) Remove duplicate detections IoU threshold 0.4-0.5
Confidence filtering Remove low-confidence detections Threshold 0.5-0.7
Size filtering Remove too-small/too-large detections Min face size 20-80px
Landmark validation Ensure landmarks are within bounding box Sanity check on keypoint positions

Common Integration (Python)

# Using InsightFace's SCRFD
from insightface.app import FaceAnalysis

app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

faces = app.get(image)

for face in faces:
    bbox = face.bbox           # [x1, y1, x2, y2]
    score = face.det_score     # confidence
    kps = face.kps             # 5 landmarks [[x,y], ...]
    embedding = face.embedding  # 512-d vector (if recognition model loaded)

Benchmarks

WIDER Face Dataset

The standard benchmark for face detection in the wild:

Difficulty Description Top Performers
Easy Large, clear faces SCRFD-34G (96.06%), RetinaFace-R152 (96.3%)
Medium Medium faces, some occlusion SCRFD-10G (92.16%), RetinaFace-R50 (94.0%)
Hard Tiny faces, heavy occlusion, extreme pose SCRFD-34G (78.68%), TinaFace (92.4%)

Key Takeaways

Summary

  • Face detection is the critical first step — all downstream processing depends on it
  • SCRFD offers the best accuracy-speed tradeoff for eKYC (2ms server, 10-30ms mobile)
  • RetinaFace provides highest accuracy when speed is less critical
  • BlazeFace is ideal for real-time mobile camera preview/guidance
  • Must detect exactly 1 face for selfie verification, and extract face from document
  • 5-point landmarks (eyes, nose, mouth corners) are essential for face alignment
  • Post-processing (NMS, confidence filtering, size filtering) is critical for clean results