Skip to content

eKYC Encyclopedia

Text Detection Models

Text Detection Models¶

Definition¶

Text detection locates regions of text within a document image, outputting bounding polygons for each text instance. This is Stage 1 of the OCR pipeline.

Key Models¶

Model	Year	Architecture	Key Innovation	Speed (GPU)
CRAFT	2019	VGG-16 + U-Net	Character-level affinity scoring	30-50ms
EAST	2017	PVANet	Direct geometry regression, very fast	10-20ms
DBNet	2020	ResNet + FPN	Differentiable binarization — adaptive thresholding	20-40ms
DBNet++	2022	DBNet + ASF	Adaptive scale fusion for multi-scale text	25-45ms
PSENet	2019	ResNet + FPN	Progressive scale expansion for touching text	30-50ms
FAST	2021	Lightweight	Fastest text detector, mobile-friendly	5-15ms

For ID Documents Specifically¶

Consideration	Details
Fixed layouts	Most text is in known positions — detection can be template-assisted
Small text	Microprint, fine text on IDs needs high-resolution input
Multi-orientation	Some fields are vertical or rotated
MRZ	Fixed-pitch OCR-B font — specialized detection
Embedded in graphics	Text over holograms, patterns, watermarks

Key Takeaways¶

Summary

DBNet is the current standard — differentiable binarization handles diverse text well
CRAFT excels at character-level detection (useful for damaged/irregular text)
For ID documents, template-assisted detection (knowing where text should be) improves accuracy
Speed is typically not the bottleneck — recognition (Stage 2) is slower