Document Capture & Quality¶
Definition¶
Document capture is the process of acquiring a high-quality image of an identity document through a phone camera or scanner. Quality at capture directly determines the accuracy of all downstream processing — OCR, forensics, and face extraction.
Auto-Capture System¶
graph TD
A[Camera Preview] --> B[Real-Time Analysis]
B --> C{Document detected?}
C -->|No| D["Guide: 'Place document in frame'"]
C -->|Yes| E[Quality Checks]
E --> F{All checks pass?}
F -->|No| G["Guide: 'Move closer' / 'Reduce glare' / 'Hold steady'"]
F -->|Yes| H[Auto-Capture Triggered]
H --> I[High-res image saved]
D --> A
G --> A
style H fill:#2E7D32,color:#fff
Quality Checks¶
| Check | Metric | Threshold | Detection Method |
|---|---|---|---|
| Document presence | Detection confidence | > 0.9 | Object detection model |
| Full document visible | All 4 corners detected | All inside frame | Corner detection |
| Blur | Laplacian variance | > 100 | Laplacian filter |
| Glare/reflection | Specular highlight area | < 5% of doc area | Highlight detection |
| Shadow | Luminance uniformity | Variance < threshold | Regional brightness analysis |
| Resolution | DPI equivalent | > 300 DPI | Pixel density calculation |
| Tilt/skew | Angle from horizontal | < 10° | Corner-based angle computation |
| Distance | Document size in frame | 60-90% of frame | Bounding box ratio |
| Lighting | Mean brightness | 80-220 | Histogram analysis |
| Occlusion | Finger/thumb overlap | < 3% overlap | Hand detection model |
Perspective Correction¶
After capture, apply perspective transform to get a flat, rectangular document image:
import cv2
import numpy as np
def perspective_correct(image, corners):
"""
corners: 4 detected document corners [TL, TR, BR, BL]
Returns: perspective-corrected rectangular image
"""
# Target dimensions (standard ID card aspect ratio)
width, height = 856, 540 # ~CR80 card at 100 DPI
dst = np.array([[0, 0], [width, 0], [width, height], [0, height]], dtype=np.float32)
M = cv2.getPerspectiveTransform(corners.astype(np.float32), dst)
corrected = cv2.warpPerspective(image, M, (width, height))
return corrected
Front vs Back Capture¶
| Side | What's Extracted | Special Challenges |
|---|---|---|
| Front | Photo, name, DOB, document number, expiry | Hologram glare over photo area |
| Back | MRZ, barcode, additional data, signature | Low contrast text, barcode readability |
Key Takeaways¶
Summary
- Quality at capture determines everything — poor capture = poor OCR + poor forensics
- Auto-capture with real-time guidance dramatically improves first-attempt success rate
- Critical checks: blur, glare, all corners visible, resolution > 300 DPI
- Perspective correction after capture normalizes the document for downstream processing
- Both front and back capture are typically needed for full data extraction