Face Alignment & Preprocessing¶

Definition¶

Face alignment is the process of transforming a detected face into a standardized, normalized representation — correcting for rotation, scale, and position — so that downstream models (recognition, liveness) receive consistent inputs regardless of how the face appeared in the original image.

Why Alignment Matters¶

graph LR
    A["Raw Detection<br/>(tilted, off-center, varied size)"] --> B["Alignment<br/>(affine transform)"]
    B --> C["Normalized Face<br/>(upright, centered, 112×112)"]
    C --> D["Recognition / Liveness<br/>(consistent input = better accuracy)"]

    style C fill:#2E7D32,color:#fff

Without alignment: Recognition accuracy drops 5-15% because the model sees different geometric transformations of the same face.

With alignment: The model receives a standardized input every time, maximizing feature extraction quality.

The Alignment Pipeline¶

Step 1: Landmark-Based Affine Transform¶

Using the 5 facial landmarks from detection:

Landmark	Standard Position (112×112)	Purpose
Left eye center	(38.29, 51.69)	Horizontal alignment reference
Right eye center	(73.53, 51.69)	Horizontal alignment reference
Nose tip	(56.02, 71.73)	Vertical centering
Left mouth corner	(41.54, 92.37)	Scale reference
Right mouth corner	(70.73, 92.37)	Scale reference

The affine transform:

Compute similarity transform matrix from detected landmarks → standard landmarks
Apply transform to entire image
Crop to 112×112 (or model's expected input size)

import cv2
import numpy as np
from skimage.transform import SimilarityTransform

# Standard reference landmarks for 112x112
ref_landmarks = np.array([
    [38.2946, 51.6963],  # left eye
    [73.5318, 51.5014],  # right eye
    [56.0252, 71.7366],  # nose
    [41.5493, 92.3655],  # left mouth
    [70.7299, 92.2041],  # right mouth
], dtype=np.float32)

def align_face(image, landmarks, output_size=112):
    tform = SimilarityTransform()
    tform.estimate(landmarks, ref_landmarks)
    M = tform.params[0:2, :]
    aligned = cv2.warpAffine(image, M, (output_size, output_size))
    return aligned

Step 2: Color Normalization¶

Method	Formula	Used By
[0, 1] scaling	pixel / 255.0	Many models
[-1, 1] scaling	(pixel - 127.5) / 128.0	ArcFace, InsightFace
ImageNet normalization	(pixel/255 - mean) / std	ViT-based models
Per-image standardization	(pixel - μ) / σ per image	Some liveness models

Step 3: Input Sizing¶

Model Type	Typical Input Size
Face recognition (ArcFace)	112 × 112
Face liveness (CNN)	224 × 224 or 256 × 256
Face liveness (ViT)	224 × 224
Face quality	112 × 112 or 224 × 224

Face Quality Assessment (Pre-Filter)¶

Before passing to recognition/liveness, assess face quality:

Quality Check	Metric	Threshold	Detection Method
Blur	Laplacian variance	> 50-100	`cv2.Laplacian(gray, cv2.CV_64F).var()`
Brightness	Mean pixel value	40-220	Histogram analysis
Face size	Pixel area	> 80×80	Bounding box dimensions
Pose (yaw)	Degrees from frontal	< 30°	Landmark-based or head pose estimator
Pose (pitch)	Degrees from frontal	< 20°	Landmark-based
Occlusion	Landmark visibility	All 5 visible	Landmark confidence scores
Eye openness	Eye aspect ratio	Eyes open	EAR (Eye Aspect Ratio)

graph TD
    A[Aligned Face] --> B[Quality Checks]
    B --> C{All checks pass?}
    C -->|Yes| D[✅ Send to recognition + liveness]
    C -->|No| E[⚠️ Reject with guidance]

    E --> F["Too blurry → 'Hold steady'"]
    E --> G["Too dark → 'Improve lighting'"]
    E --> H["Face turned → 'Look directly at camera'"]
    E --> I["Eyes closed → 'Keep eyes open'"]

    style D fill:#2E7D32,color:#fff

Special Cases in eKYC¶

Document Face Alignment¶

Faces extracted from ID documents have unique challenges:

Challenge	Impact	Mitigation
Low resolution	50-150px face on ID	Super-resolution or quality-aware matching
Print artifacts	Halftone dots, moiré patterns	Preprocessing filters
Color distortion	Faded, yellowed photos	Color correction, histogram equalization
Partial occlusion	Hologram/lamination overlap	Multiple capture attempts, angle guidance

Cross-Domain Alignment (ID Photo ↔ Selfie)¶

The aligned face from an ID document and a selfie will differ in: - Resolution: ID face ~100px vs selfie ~300px - Quality: Print artifacts vs camera noise - Age: ID photo may be years older - Lighting: Studio flash vs ambient

Models like AdaFace are specifically designed to handle this quality mismatch by adapting feature importance based on image quality.

Key Takeaways¶

Summary

Face alignment via 5-point affine transform is essential — improves recognition accuracy by 5-15%
Standard target: 112×112 pixels for recognition, 224×224 for liveness
Quality assessment before downstream processing prevents garbage-in-garbage-out
Document face alignment has unique challenges: low resolution, print artifacts, color distortion
Color normalization must match the model's training preprocessing exactly
AdaFace handles quality mismatch between ID photos and selfies