15. Error Analysis¶

Who should read this page¶

This page is mainly for ML engineers, QA teams, fraud analysts, and release owners who need to understand why the system failed, not just how often it failed.

Why this page exists¶

A benchmark number tells you that errors happened.

Error analysis tells you:

what kind of errors happened
where they happened
why they happened
what should be fixed next

That is why strong teams spend time reviewing failure cases, not only summary metrics.

Start with the two main error families¶

Error family	Meaning
false accept	spoof was accepted as live
false reject	genuine user was rejected or routed into bad friction

Both matter, but the business impact is different.

A practical error-analysis workflow¶

flowchart TB
    A[Collect failed<br/>cases] --> B[Group by error<br/>family]
    B --> C[Slice by segment]
    C --> D[Review examples]
    D --> E[Assign root cause]
    E --> F[Choose fix<br/>and owner]

Useful segmentation axes¶

Do not review all failures as one pile.

Good segment views include:

attack type
platform and device class
app vs web
lighting condition
blur or quality bucket
model version
SDK version
geography or environment if relevant
demographic segment if policy allows and it is appropriate

Root-cause buckets¶

A simple triage taxonomy helps teams act faster.

Root-cause bucket	Examples
model issue	score too high on replay, weak on deepfake, unstable under blur
data issue	missing attack type in training, poor low-light coverage, noisy labels
threshold issue	too strict on one channel, retry band too narrow
capture UX issue	user guidance weak, face too small, challenge unclear
infrastructure issue	timeout, frame drop, browser camera mismatch
security issue	injection not blocked, virtual camera not detected

False-accept review template¶

When a spoof is accepted, record:

attack family and exact attack style
device and platform
score from each model
quality metrics
whether any security signal fired
whether the attack is new or already known
whether this should have been blocked by policy instead of the model

This helps separate model weakness from missing controls.

False-reject review template¶

When a real user is rejected, record:

platform and device
lighting and blur conditions
quality gate result
liveness score and identity-match context if relevant
retry count
whether a better user instruction could have fixed the issue

Many false rejects come from poor capture conditions, not from a bad spoof detector.

A simple investigation matrix¶

Question	Why it matters
Was the input genuinely poor quality?	may point to UX or quality gate
Did one model disagree strongly with others?	may reveal fusion or calibration issue
Is the failure concentrated on one device or channel?	may reveal platform problem
Did the same issue increase after a release?	may reveal regression
Is the attack new?	may require new data or new security control

What to save for every reviewed case¶

A good review package should preserve:

request ID or sample ID
final decision
intermediate scores
quality signals
device/session metadata
model and threshold versions
screenshot or capture reference where policy allows
human review notes
fix category and owner

Without this, the same problem gets rediscovered later.

Example error-analysis outputs¶

Output	Use
top false-accept patterns	fraud-risk prioritization
top false-reject patterns	user-friction prioritization
affected channels	platform or SDK fixes
attack gaps	new collection and benchmark plans
calibration drift findings	threshold or release policy changes

Common findings and likely actions¶

Finding	Likely action
replay attacks accepted mostly on desktop web	improve web-specific policy and security controls
live users rejected in dim indoor conditions	collect more low-light data and improve capture guidance
one model dominates wrong decisions	recalibrate or reduce its weight in fusion
failures spike after release	rollback or hotfix threshold / SDK / model
same attack family keeps appearing	create focused challenge set and mitigation plan

Turn error analysis into a regular ritual¶

A useful cadence is:

weekly failure review during active rollout
release review before every major model or policy change
monthly attack-gap and friction review

This keeps the program learning instead of reacting late.

Final takeaway¶

Error analysis should answer:

what failed
where it failed
why it failed
who owns the fix
how success will be checked next time

That is how teams turn incidents into improvement.

Need term help?¶

If any technical terms on this page feel dense, use Appendix A1 — Key Terms first and then jump to the relevant appendix page for deeper detail.