Risk Scoring Engines¶
Definition¶
A risk scoring engine aggregates multiple signals from verification, device, behavioral, and external data sources into a single risk score that drives the approve/review/reject decision.
Scoring Architecture¶
graph TD
A[Input Signals] --> B[Feature Engineering]
B --> C[Risk Model]
C --> D[Risk Score 0-100]
D --> E[Decision Rules]
A --> A1[Verification scores<br/>Liveness, match, OCR]
A --> A2[Device signals<br/>Fingerprint, root, emulator]
A --> A3[Behavioral signals<br/>Session timing, interaction patterns]
A --> A4[External signals<br/>Email age, phone risk, IP reputation]
A --> A5[Historical signals<br/>Previous attempts, fraud history]
style C fill:#4051B5,color:#fff
Feature Categories¶
| Category | Example Features |
|---|---|
| Verification | Face match score, liveness score, document forensic score, OCR confidence |
| Device | Device age, root status, emulator detection, virtual camera, multiple accounts from device |
| Behavioral | Time to complete flow, number of retries, interaction velocity, hesitation patterns |
| Network | IP reputation, VPN/proxy detection, geographic consistency, ASN risk |
| Identity | Email age, phone carrier risk, SSN/Aadhaar consistency, credit bureau match |
| Velocity | Verifications per device/hour, IP address frequency, document reuse |
| Historical | Previous failed attempts, fraud flags on linked accounts |
Model Approaches¶
| Approach | Pros | Cons |
|---|---|---|
| Rules-based | Interpretable, easy to update, no training data needed | Brittle, misses complex patterns |
| Logistic regression | Interpretable, fast, decent performance | Limited to linear relationships |
| Gradient boosting (XGBoost/LightGBM) | High accuracy, handles mixed features well | Less interpretable, needs training data |
| Neural network | Captures complex patterns | Black box, needs large training data |
| Ensemble | Rules + ML model combined | More complex operations |
Key Takeaways¶
Summary
- Risk scoring combines dozens of signals into a single decision-ready score
- Gradient boosting (XGBoost/LightGBM) is the industry standard for fraud scoring
- Feature engineering (device age, behavioral patterns, velocity) matters more than model choice
- Rules + ML hybrid is the most practical approach — rules for hard constraints, ML for patterns
- Model must be explainable — regulators require justification for rejection decisions