Face Detection¶

Definition¶

Face detection is the task of locating and extracting human faces from an image or video frame. In eKYC, it is the first step in both the face recognition and liveness detection pipelines — everything downstream depends on accurate, fast face detection.

What Face Detection Outputs¶

graph LR
    A[Input Image] --> B[Face Detector]
    B --> C[Bounding Box<br/>x, y, w, h]
    B --> D[Confidence Score<br/>0.0 - 1.0]
    B --> E[Facial Landmarks<br/>5 or 68+ points]
    B --> F[Face Count]

    style B fill:#4051B5,color:#fff

Output	Details	Use in eKYC
Bounding box	Rectangle coordinates around each face	Crop face for recognition/liveness
Confidence score	Probability that detection is a face (0-1)	Filter false detections
Landmarks (5-point)	Left eye, right eye, nose, left mouth, right mouth	Face alignment for recognition
Landmarks (68/106-point)	Detailed facial structure	Quality assessment, 3D pose estimation
Face count	Number of faces detected	Ensure exactly 1 face for selfie, detect face on document

Modern Face Detection Architectures¶

SCRFD (Sample and Computation Redistribution for Face Detection)¶

Aspect	Details
Paper	SCRFD: Sample and Computation Redistribution for Efficient Face Detection (2021)
Architecture	Single-stage anchor-based detector with efficient backbone
Key innovation	Redistributes computation across scales and samples for optimal efficiency
Accuracy	WIDER Face: 93.78% (Easy), 92.16% (Medium), 77.87% (Hard)
Speed	~2ms on GPU (SCRFD-10GF), ~10ms on mobile
Landmarks	5 keypoints
Why it's popular for eKYC	Best accuracy-speed tradeoff, multiple model sizes

SCRFD model variants:

Model	GFLOPs	WIDER Face Easy	Speed (GPU)	Use Case
SCRFD-500M	0.5	90.57%	~1ms	Mobile/edge, real-time
SCRFD-2.5G	2.5	93.78%	~2ms	Server, balanced
SCRFD-10G	10	95.16%	~4ms	Server, highest accuracy
SCRFD-34G	34	96.06%	~12ms	Offline batch processing

RetinaFace¶

Aspect	Details
Paper	RetinaFace: Single-shot Multi-level Face Localisation in the Wild (2020)
Architecture	Single-stage with FPN + multi-task (box, landmarks, 3D face)
Key innovation	Joint extra-supervised learning with mesh decoder for 3D face
Accuracy	WIDER Face: 96.3% (Easy) with ResNet-152 backbone
Landmarks	5 keypoints (lightweight) or dense landmarks
Why it's popular	Proven, widely used, excellent landmark accuracy

BlazeFace¶

Aspect	Details
Origin	Google MediaPipe (2019)
Architecture	Lightweight SSD-based, optimized for mobile
Speed	~1ms on mobile GPU
Accuracy	Good for frontal faces, less robust for extreme angles
Use case	Real-time mobile face tracking, camera preview guidance

Comparison¶

Detector	Accuracy	Speed (GPU)	Speed (Mobile)	Landmarks	Best For
SCRFD-2.5G	93.8%	2ms	10-30ms	5-point	eKYC server pipeline
SCRFD-500M	90.6%	1ms	5-15ms	5-point	eKYC mobile SDK
RetinaFace-R50	94.9%	5ms	50-100ms	5-point	High-accuracy server
BlazeFace	~90%	0.5ms	1-3ms	6-point	Real-time camera preview
MTCNN	~91%	15ms	100-300ms	5-point	Legacy systems
YOLO-Face	~92%	3ms	20-50ms	5-point	General purpose

Face Detection in the eKYC Pipeline¶

Selfie Face Detection¶

graph TD
    A[Selfie Image] --> B[Face Detection]
    B --> C{How many faces?}
    C -->|0 faces| D[❌ Reject: No face detected]
    C -->|1 face| E[✅ Continue pipeline]
    C -->|2+ faces| F[❌ Reject: Multiple faces]

    E --> G{Face size check}
    G -->|Too small < 80px| H[❌ Reject: Face too far]
    G -->|Too large > 90% frame| I[❌ Reject: Face too close]
    G -->|Appropriate size| J[✅ Proceed to alignment]

    J --> K{Landmark quality}
    K -->|All 5 landmarks detected| L[✅ Face alignment]
    K -->|Landmarks missing/poor| M[⚠️ Quality warning]

    style E fill:#2E7D32,color:#fff
    style D fill:#e53935,color:#fff

Document Face Detection¶

Challenge	Details	Solution
Small face	Face on ID card is often only 100-200px	Use high-res capture, specialized detector
Print artifacts	Printed photo has halftone dots, moiré	Preprocessing to reduce print noise
Occlusion	Hologram, lamination reflection overlaps face	Quality check + multiple capture angles
Old/damaged photo	Faded, scratched, low contrast	Image enhancement before detection
Multiple faces	Group photo on some documents	Select face matching the document's face region

Implementation Details¶

Preprocessing for Detection¶

Step	Purpose	Implementation
Resize	Normalize input to expected resolution	Longest side to 640px (maintaining aspect ratio)
Color normalization	Handle different lighting	Convert to RGB, normalize to [0,1] or [-1,1]
Padding	Handle non-square images	Pad to square with border value

Post-Processing¶

Step	Purpose	Details
NMS (Non-Maximum Suppression)	Remove duplicate detections	IoU threshold 0.4-0.5
Confidence filtering	Remove low-confidence detections	Threshold 0.5-0.7
Size filtering	Remove too-small/too-large detections	Min face size 20-80px
Landmark validation	Ensure landmarks are within bounding box	Sanity check on keypoint positions

Common Integration (Python)¶

# Using InsightFace's SCRFD
from insightface.app import FaceAnalysis

app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

faces = app.get(image)

for face in faces:
    bbox = face.bbox           # [x1, y1, x2, y2]
    score = face.det_score     # confidence
    kps = face.kps             # 5 landmarks [[x,y], ...]
    embedding = face.embedding  # 512-d vector (if recognition model loaded)

Benchmarks¶

WIDER Face Dataset¶

The standard benchmark for face detection in the wild:

Difficulty	Description	Top Performers
Easy	Large, clear faces	SCRFD-34G (96.06%), RetinaFace-R152 (96.3%)
Medium	Medium faces, some occlusion	SCRFD-10G (92.16%), RetinaFace-R50 (94.0%)
Hard	Tiny faces, heavy occlusion, extreme pose	SCRFD-34G (78.68%), TinaFace (92.4%)

Key Takeaways¶

Summary

Face detection is the critical first step — all downstream processing depends on it
SCRFD offers the best accuracy-speed tradeoff for eKYC (2ms server, 10-30ms mobile)
RetinaFace provides highest accuracy when speed is less critical
BlazeFace is ideal for real-time mobile camera preview/guidance
Must detect exactly 1 face for selfie verification, and extract face from document
5-point landmarks (eyes, nose, mouth corners) are essential for face alignment
Post-processing (NMS, confidence filtering, size filtering) is critical for clean results