Skip to content

CNNs for eKYC

Definition

Convolutional Neural Networks (CNNs) are the backbone architecture for most eKYC vision tasks — face detection, recognition, liveness, document classification, and OCR.


Key CNN Families Used in eKYC

Family Models Params Use in eKYC
ResNet ResNet-18/34/50/100 11-65M Face recognition backbone (IResNet), liveness
EfficientNet B0-B7 5-66M Document classification, liveness (good accuracy/size ratio)
MobileNet V2, V3-S, V3-L 2-5M Mobile face detection, on-device liveness
MobileOne S0-S4 2-15M Ultra-fast mobile inference
GhostNet 1.0x, 1.3x 5-7M Efficient mobile backbone
ConvNeXt Tiny/Small/Base 28-89M Modern CNN competitive with ViT

Why CNNs Still Dominate eKYC

Reason Details
Speed 2-5x faster than equivalent ViT on mobile
Proven Years of production deployment, well-understood
Mobile-friendly Efficient convolution ops on mobile hardware
Training data efficient CNNs need less data than ViT to train well
Tooling TensorRT, ONNX, CoreML all optimized for CNN ops

Key Takeaways

Summary

  • ResNet (server) and MobileNet (mobile) remain the workhorse architectures for eKYC
  • EfficientNet provides the best accuracy-per-FLOP for many tasks
  • CNNs are faster and more mobile-friendly than ViTs — still dominant in production
  • ConvNeXt proves modern CNNs can match ViT accuracy with CNN efficiency