Skip to content

GPU Infrastructure

Definition

GPU selection, management, and cost optimization for eKYC ML inference workloads.


GPU Selection for eKYC

GPU VRAM eKYC Throughput Cost/hr (Cloud) Best For
T4 16GB 50-100 verifications/min $0.50-1.00 Cost-optimized, standard workloads
A10G 24GB 80-150 verifications/min $1.00-1.50 Balanced performance/cost
L4 24GB 100-200 verifications/min $0.80-1.20 Newest generation, efficient
A100 (40GB) 40GB 200-400 verifications/min $3.00-4.00 High-volume, batch processing
H100 80GB 400-800 verifications/min $8.00-12.00 Maximum throughput

Cost Optimization

Strategy Savings
Spot/preemptible instances 60-80% cost reduction (use for non-critical batch)
Reserved instances 30-60% for committed usage
Right-sizing T4 often sufficient — don't default to A100
Multi-model per GPU Run liveness + recognition + OCR on same GPU via Triton
Dynamic scaling Scale to zero during low-traffic hours
INT8 quantization Same GPU handles 2-4x more requests

Key Takeaways

Summary

  • T4/L4 are the sweet spot for most eKYC workloads — best cost/performance ratio
  • Multi-model serving (Triton) maximizes GPU utilization — don't dedicate one GPU per model
  • Spot instances for batch processing, reserved for baseline, on-demand for spikes
  • INT8 quantization effectively doubles GPU capacity at minimal accuracy cost