21. Troubleshooting¶

Who should read this page¶

This page is mainly for support teams, QA teams, backend engineers, SDK engineers, and release owners who need a practical response when something starts going wrong.

Why this page exists¶

When a liveness issue appears in production, teams usually need quick triage.

The goal of this page is to help answer:

where the issue is happening
whether it is model, policy, UX, SDK, or infrastructure related
what to check first

Fast triage checklist¶

identify affected platform or channel
identify affected version: model, policy, SDK, app, or browser
check pass / retry / fail shifts
check latency and timeout changes
inspect top recent failure examples
see whether a rollback or hotfix path exists

Symptom: retry rate suddenly jumps¶

Likely causes¶

threshold became too strict
capture guidance regressed
camera behavior changed after app release
web browser issue or permission behavior changed
low-light traffic increased

First checks¶

compare by platform and version
inspect quality metrics
compare score distributions before and after release

Symptom: spoof acceptance incident¶

Likely causes¶

new attack pattern not covered
weakened threshold or policy
security control bypass
one channel has weaker enforcement

First checks¶

classify attack family
inspect intermediate model scores
check whether client or session security signals fired
isolate affected channel and version

Symptom: latency is much worse¶

Likely causes¶

infrastructure regression
heavy model or fusion change
client-side capture slowdown
network path issue

First checks¶

compare p50/p95/p99 latency by channel
identify whether slowdown is client, API, or model stage
compare with latest release changes

Symptom: web flow behaves much worse than app flow¶

Likely causes¶

weak webcam quality
browser media constraints
virtual camera or replay exposure
web-specific policy not tuned

First checks¶

segment by browser family and version
inspect webcam quality and face-size statistics
compare threshold and challenge behavior by channel

Symptom: one model says live and another says spoof¶

Likely causes¶

model disagreement on edge case
calibration issue
channel-specific weakness
bad fusion weight or policy

First checks¶

compare calibrated vs raw scores
check if disagreement is concentrated on one segment
review recent fusion or threshold changes

Symptom: many real users fail in dim light¶

Likely causes¶

weak low-light data coverage
quality gate too strict
device class heavily affecting capture

First checks¶

compare by lighting bucket and device class
inspect false rejects with quality metrics
test whether retry guidance helps significantly

Symptom: one SDK or app version is much worse¶

Likely causes¶

camera pipeline change
image compression change
integration bug
metadata missing or malformed

First checks¶

compare release notes
verify request schema and captured payload
inspect sample media from affected version

A simple triage matrix¶

Signal	Most likely owner
score shift only	ML / calibration / policy
latency and timeouts	platform / backend
camera or permission failures	SDK / client
attack bypass	security + ML
one version regressed	release owner + owning team

When to rollback quickly¶

Consider fast rollback when:

spoof acceptance is confirmed in a sensitive flow
retry or failure spikes sharply after release
one key channel becomes unreliable
incident impact is already customer-visible

Final takeaway¶

Troubleshooting works best when the system already logs:

request IDs
versions
key intermediate signals
channel and device metadata
latency stages

Without that, the team ends up guessing.

Need term help?¶

If any technical terms on this page feel dense, use Appendix A1 — Key Terms first and then jump to the relevant appendix page for deeper detail.

21. Troubleshooting¶

Who should read this page¶

Why this page exists¶

Fast triage checklist¶

Symptom: retry rate suddenly jumps¶

Likely causes¶

First checks¶

Symptom: spoof acceptance incident¶

Likely causes¶

First checks¶

Symptom: latency is much worse¶

Likely causes¶

First checks¶

Symptom: web flow behaves much worse than app flow¶

Likely causes¶

First checks¶

Symptom: one model says live and another says spoof¶

Likely causes¶

First checks¶

Symptom: many real users fail in dim light¶

Likely causes¶

First checks¶

Symptom: one SDK or app version is much worse¶

Likely causes¶

First checks¶

A simple triage matrix¶

When to rollback quickly¶

Final takeaway¶

Need term help?¶

Related docs¶

Read next¶