Skip to content

Document Forensics Overview

Definition

Document forensics in eKYC detects whether an identity document has been tampered with, forged, or digitally manipulated. It answers the critical question: "Is this document authentic, or has it been altered?"


Types of Document Fraud

graph TD
    A[Document Fraud] --> B[Physical Fraud]
    A --> C[Digital Fraud]

    B --> B1[Counterfeit<br/>Complete fake document]
    B --> B2[Forged<br/>Genuine document with altered data]
    B --> B3[Stolen blank<br/>Real blank document with fake data]
    B --> B4[Impostor use<br/>Someone else's genuine document]

    C --> C1[Photo substitution<br/>Replace face photo]
    C --> C2[Text editing<br/>Change name, DOB, ID number]
    C --> C3[Splicing<br/>Combine parts from different documents]
    C --> C4[AI-generated<br/>Fully synthetic fake document]

    style C fill:#e53935,color:#fff
    style B fill:#F57F17,color:#000

Forensic Detection Methods

Error Level Analysis (ELA)

Aspect Details
How it works Re-save JPEG at known quality, compare error levels — manipulated regions show different error patterns
Detects Photo splicing, text editing, region replacement
Limitation Ineffective on uncompressed images or high-quality re-saves

Noise Analysis

Aspect Details
How it works Analyze sensor noise pattern — manipulated regions have inconsistent noise
Detects Copy-move, splicing from different sources
Techniques Noise level estimation, noise inconsistency maps

Copy-Move Detection

Aspect Details
How it works Find duplicate regions within the document (e.g., cloned background to hide text)
Techniques SIFT/SURF keypoint matching, PatchMatch, deep feature matching
Detects Background cloning to cover original text, replicated security patterns

Font Consistency Analysis

Aspect Details
How it works Verify all text uses expected font — edited text often has different font characteristics
Detects Text field replacement where attacker uses different font
Techniques Font classification model, character-level feature comparison

Deep Learning Forensics

Model Approach Detects
ManTraNet Manipulation tracing network — pixel-level prediction General manipulation
MVSS-Net Multi-View Multi-Scale supervision Splicing, copy-move
CAT-Net Compression Artifact Tracing JPEG double compression from editing
Custom CNN Binary classifier on document regions Document-specific tampering

Forensic Pipeline for eKYC

graph TD
    A[Document Image] --> B[Preprocessing<br/>Enhance, normalize]
    B --> C[Parallel Forensic Checks]

    C --> D[ELA Analysis<br/>Compression artifacts]
    C --> E[Noise Analysis<br/>Noise inconsistency]
    C --> F[Copy-Move Detection<br/>Duplicate regions]
    C --> G[Font Consistency<br/>Text uniformity]
    C --> H[Edge Analysis<br/>Splicing boundaries]
    C --> I[Deep Forensic Model<br/>Learned manipulation features]

    D & E & F & G & H & I --> J[Forensic Score Aggregation]
    J --> K{Authenticity Score}
    K -->|High confidence authentic| L[✅ Pass]
    K -->|Suspicious| M[⚠️ Manual review]
    K -->|Clearly tampered| N[❌ Reject]

    style L fill:#2E7D32,color:#fff
    style N fill:#e53935,color:#fff

Accuracy Expectations

Fraud Type Detection Rate False Positive Rate
Obvious text editing (font mismatch, alignment) 95%+ < 1%
Photo substitution 90%+ < 2%
Professional text editing (matching font) 60-80% 3-5%
High-quality counterfeit 40-70% 5-10%
AI-generated fake 30-60% (evolving) Variable

Key Takeaways

Summary

  • Document forensics uses multiple complementary methods — no single technique catches everything
  • ELA and noise analysis are effective baselines; deep learning adds learned manipulation patterns
  • Text editing is the most common digital fraud — font consistency analysis is critical
  • Detection accuracy varies widely: obvious edits (95%+) to AI-generated fakes (30-60%)
  • A multi-signal forensic pipeline with score aggregation is the production approach
  • This is an arms race — attackers improve tools, so forensic models need continuous updates