Document Forensics Overview
Definition
Document forensics in eKYC detects whether an identity document has been tampered with, forged, or digitally manipulated. It answers the critical question: "Is this document authentic, or has it been altered?"
Types of Document Fraud
graph TD
A[Document Fraud] --> B[Physical Fraud]
A --> C[Digital Fraud]
B --> B1[Counterfeit<br/>Complete fake document]
B --> B2[Forged<br/>Genuine document with altered data]
B --> B3[Stolen blank<br/>Real blank document with fake data]
B --> B4[Impostor use<br/>Someone else's genuine document]
C --> C1[Photo substitution<br/>Replace face photo]
C --> C2[Text editing<br/>Change name, DOB, ID number]
C --> C3[Splicing<br/>Combine parts from different documents]
C --> C4[AI-generated<br/>Fully synthetic fake document]
style C fill:#e53935,color:#fff
style B fill:#F57F17,color:#000
Forensic Detection Methods
Error Level Analysis (ELA)
| Aspect |
Details |
| How it works |
Re-save JPEG at known quality, compare error levels — manipulated regions show different error patterns |
| Detects |
Photo splicing, text editing, region replacement |
| Limitation |
Ineffective on uncompressed images or high-quality re-saves |
Noise Analysis
| Aspect |
Details |
| How it works |
Analyze sensor noise pattern — manipulated regions have inconsistent noise |
| Detects |
Copy-move, splicing from different sources |
| Techniques |
Noise level estimation, noise inconsistency maps |
Copy-Move Detection
| Aspect |
Details |
| How it works |
Find duplicate regions within the document (e.g., cloned background to hide text) |
| Techniques |
SIFT/SURF keypoint matching, PatchMatch, deep feature matching |
| Detects |
Background cloning to cover original text, replicated security patterns |
Font Consistency Analysis
| Aspect |
Details |
| How it works |
Verify all text uses expected font — edited text often has different font characteristics |
| Detects |
Text field replacement where attacker uses different font |
| Techniques |
Font classification model, character-level feature comparison |
Deep Learning Forensics
| Model |
Approach |
Detects |
| ManTraNet |
Manipulation tracing network — pixel-level prediction |
General manipulation |
| MVSS-Net |
Multi-View Multi-Scale supervision |
Splicing, copy-move |
| CAT-Net |
Compression Artifact Tracing |
JPEG double compression from editing |
| Custom CNN |
Binary classifier on document regions |
Document-specific tampering |
Forensic Pipeline for eKYC
graph TD
A[Document Image] --> B[Preprocessing<br/>Enhance, normalize]
B --> C[Parallel Forensic Checks]
C --> D[ELA Analysis<br/>Compression artifacts]
C --> E[Noise Analysis<br/>Noise inconsistency]
C --> F[Copy-Move Detection<br/>Duplicate regions]
C --> G[Font Consistency<br/>Text uniformity]
C --> H[Edge Analysis<br/>Splicing boundaries]
C --> I[Deep Forensic Model<br/>Learned manipulation features]
D & E & F & G & H & I --> J[Forensic Score Aggregation]
J --> K{Authenticity Score}
K -->|High confidence authentic| L[✅ Pass]
K -->|Suspicious| M[⚠️ Manual review]
K -->|Clearly tampered| N[❌ Reject]
style L fill:#2E7D32,color:#fff
style N fill:#e53935,color:#fff
Accuracy Expectations
| Fraud Type |
Detection Rate |
False Positive Rate |
| Obvious text editing (font mismatch, alignment) |
95%+ |
< 1% |
| Photo substitution |
90%+ |
< 2% |
| Professional text editing (matching font) |
60-80% |
3-5% |
| High-quality counterfeit |
40-70% |
5-10% |
| AI-generated fake |
30-60% (evolving) |
Variable |
Key Takeaways
Summary
- Document forensics uses multiple complementary methods — no single technique catches everything
- ELA and noise analysis are effective baselines; deep learning adds learned manipulation patterns
- Text editing is the most common digital fraud — font consistency analysis is critical
- Detection accuracy varies widely: obvious edits (95%+) to AI-generated fakes (30-60%)
- A multi-signal forensic pipeline with score aggregation is the production approach
- This is an arms race — attackers improve tools, so forensic models need continuous updates
Related Articles