Skip to content

Edge AI Deployment

Definition

Deploying ML models on user devices (phones, tablets, IoT) for real-time, privacy-preserving, offline-capable eKYC processing.


Deployment Targets

Platform Runtime Model Format GPU Access
iOS CoreML .mlmodel Neural Engine, GPU
Android TFLite / ONNX .tflite / .onnx NNAPI, GPU Delegate
Android NCNN .param + .bin Vulkan GPU
Cross-platform ONNX Runtime Mobile .onnx Platform-specific
Web ONNX Runtime Web .onnx WebGL, WASM

Conversion Pipeline

graph LR
    A[PyTorch Model] --> B[Export to ONNX]
    B --> C{Target Platform}
    C -->|iOS| D[CoreML Tools → .mlmodel]
    C -->|Android| E[TFLite Converter → .tflite]
    C -->|Cross-platform| F[ONNX Runtime → .onnx]
    C -->|Web| G[ONNX Web → WASM]

Key Takeaways

Summary

  • ONNX is the most portable format — convert once, deploy everywhere
  • CoreML provides best iOS performance via Neural Engine
  • Quantize to INT8 before mobile deployment — 2-4x faster, 4x smaller
  • Always benchmark on actual target devices — simulator performance differs from real hardware