Skip to content

Document Classification

Definition

Document classification automatically identifies the type and country of an identity document from a captured image. With 6,000+ document types globally (and hundreds of format variants), classification must determine: What country? What document type? What version/generation?


Classification Hierarchy

graph TD
    A[Document Image] --> B[Level 1: Document Category]
    B --> B1[Passport]
    B --> B2[National ID]
    B --> B3[Driving License]
    B --> B4[Residence Permit]
    B --> B5[Utility Bill]
    B --> B6[Other]

    B2 --> C[Level 2: Country]
    C --> C1[India]
    C --> C2[USA]
    C --> C3[UK]
    C --> C4[...]

    C1 --> D[Level 3: Document Variant]
    D --> D1[Aadhaar Card]
    D --> D2[PAN Card]
    D --> D3[Voter ID]
    D --> D4[Driving License]

    D1 --> E[Level 4: Version/Generation]
    E --> E1[Aadhaar letter format]
    E --> E2[Aadhaar PVC card]
    E --> E3[mAadhaar digital]

    style B fill:#4051B5,color:#fff

Model Architectures

Approach Architecture Accuracy Speed Notes
CNN classifier EfficientNet-B0/B2 98-99% 10-30ms Standard approach
ViT classifier ViT-Small/Base 99%+ 20-50ms Better on diverse layouts
Hierarchical Level1 CNN → Level2 CNN 99%+ 20-60ms Separate models per level
Multi-task Shared backbone + multiple heads 98%+ 15-30ms Country + type + version jointly

Challenges

Challenge Details Mitigation
6000+ classes Massive classification space Hierarchical: country first, then type
Visual similarity Many IDs look alike (same template, different country) Fine-grained features, MRZ/text cues
New document versions Countries update ID formats regularly Continuous model updates, few-shot learning
Poor image quality Blur, glare, partial capture Quality-gate before classification
Double-sided Front and back look very different Separate front/back classifiers
Rare documents Long-tail distribution — some types seen rarely Data augmentation, few-shot approaches

Document Coverage by Vendor

Vendor Document Types Countries
Regula 14,000+ 247
Sumsub 14,000+ 220
Veriff 12,000+ 230
Jumio 5,000+ 200
Onfido 2,500+ 195
HyperVerge 1,000+ 150
Microblink 2,500+ 140

Key Takeaways

Summary

  • Classification must handle 6,000+ document types across 200+ countries
  • Hierarchical classification (category → country → type → version) is the standard approach
  • EfficientNet/ViT classifiers achieve 99%+ accuracy on known types
  • New document versions require continuous model updates — this is ongoing maintenance
  • Document coverage (number of supported types) is a key vendor differentiator