Performance Optimization¶
Definition¶
Reducing end-to-end verification latency and maximizing throughput — from SDK capture to final decision.
Latency Budget¶
| Stage | Target | Optimization |
|---|---|---|
| SDK capture + upload | < 5s | Image compression, quality pre-check on-device |
| Image ingestion | < 200ms | Direct S3 upload, pre-signed URLs |
| Face processing | < 500ms | GPU batching, TensorRT, INT8 |
| Document processing | < 1s | Parallel OCR + forensics, TensorRT |
| Screening | < 500ms | Cached lists, parallel API calls |
| Decision | < 100ms | In-memory rules engine |
| Webhook delivery | < 200ms | Async, non-blocking |
| Total server | < 3s | Pipeline parallelism |
Optimization Techniques¶
| Technique | Impact |
|---|---|
| Pipeline parallelism | Run face + document + screening concurrently |
| GPU batching | Process multiple images in single GPU call |
| Pre-signed upload | Skip API server for image upload → direct to S3 |
| Connection pooling | Reuse DB/API connections |
| Result caching | Cache screening results for same-day re-checks |
| CDN for SDK | Fast SDK delivery globally |
Key Takeaways¶
Summary
- Target: < 3 seconds server-side processing (< 5 seconds total including upload)
- Pipeline parallelism is the biggest win — face, document, and screening run concurrently
- GPU optimization (TensorRT, INT8, batching) directly reduces the processing bottleneck
- Pre-signed uploads bypass API server — reduces latency and API server load