09. Common Failures¶

Who should read this page¶

This page is useful for QA teams, ML engineers, product teams, support teams, and anyone trying to understand why a face liveness system behaves well in some cases and poorly in others.

Why this page exists¶

The most important lessons in face liveness often come from failures.

If teams only study successful cases, they miss the exact situations that create friction, support tickets, and fraud risk in production.

A simple way to think about failures¶

Most failures come from one or more of these groups:

weak input quality
attack sophistication
device or platform limitations
policy or threshold problems
integration mistakes

Failure group 1: Input quality problems¶

Examples¶

face too small in frame
strong blur from hand motion
low light or strong backlight
partial face crop
heavy occlusion from mask, glasses, or hair

Typical result¶

The system may reject a genuine user or produce unstable scores.

What helps¶

better capture guidance
quality checks before liveness
limited guided retries
minimum face size and framing rules

Failure group 2: Replay and screen-based attacks¶

Examples¶

video replay on a phone screen
replay on a bright tablet
replay recorded in good lighting with high image quality

Why these are tricky¶

A replay can look visually clean and still be fraudulent. Strong visual quality is not the same as strong liveness evidence.

What helps¶

evaluate specifically against replay attacks
test different screens, brightness levels, and camera distances
use stronger policy in high-risk flows

Failure group 3: Injection and virtual camera attacks¶

Examples¶

injected image stream instead of live camera feed
emulator or tampered app path
virtual camera in browser or desktop workflow

Why these are serious¶

These attacks may bypass many surface-level capture assumptions.

What helps¶

secure capture path controls
client integrity checks where possible
browser and app hardening
environment-specific testing

Failure group 4: AI-generated or manipulated content¶

Examples¶

deepfake video
face swap overlay
AI-enhanced replay
synthetic identity media

Why these matter¶

Attack quality is improving quickly. A system that was tested only against older print and replay attacks may not be ready for newer synthetic content.

What helps¶

include modern attack classes in evaluation
monitor for new fraud patterns after launch
retest regularly rather than assuming past results still hold

Failure group 5: Threshold and policy mistakes¶

Examples¶

threshold too strict, causing too many genuine rejects
threshold too weak, allowing spoof risk
unlimited retries
weak fallback path after repeated uncertain results

What helps¶

calibrate on local data
document score bands clearly
evaluate retry policy, not just model score
review edge cases before go-live

Failure group 6: Device and environment mismatch¶

Problem	Why it matters
low-end phone camera	noisier and less stable input
browser webcam	weaker consistency than app SDK in many cases
poor network	can affect video workflows and user patience
old OS or browser	may break expected capture behavior

What helps¶

Segment testing and monitoring by device, platform, browser, and app version.

Failure group 7: Integration mistakes¶

Examples¶

frontend misreads the response payload
backend treats success=true as automatic approve
score scale changes after a model update but policy is not updated
retryable quality failure is shown as a hard fraud reject

What helps¶

stable API envelope
explicit decision layer after raw model response
versioned model and policy
integration tests that include bad-input and error cases

Failure group 8: Human and operational issues¶

Examples¶

support team cannot explain a failure reason
review team sees too many uncertain cases
fraud team cannot trace which policy version was used
monitoring dashboards hide segment-level spikes

What helps¶

better logging and diagnostics
clear reason codes
segment-level dashboards
documented escalation paths

Failure review template¶

When a failure happens, ask:

Was the input quality acceptable?
Was the attack type covered in testing?
Did the device or channel behave as expected?
Did policy handle uncertainty correctly?
Was the integration reading the model output correctly?
Can we explain the failure in logs and analytics?

This creates a better failure culture than blaming the model alone.

Common examples by symptom¶

Symptom	Possible causes
too many genuine rejects in dim rooms	threshold too strict, low-light weakness, poor guidance
spoof passes on web only	browser capture path weakness, virtual camera or injection risk
score unstable across retries	weak capture consistency, noisy quality conditions
manual review load too high	uncertain band too wide, poor quality gate, policy not tuned
support cannot explain failures	weak diagnostics and reason codes

Final takeaway¶

A strong face liveness program does not aim for zero failures. It aims for failures that are:

understood
measured
limited in impact
fixable by design, testing, or policy change

That is how systems improve over time.

09. Common Failures¶

Who should read this page¶

Why this page exists¶

A simple way to think about failures¶

Failure group 1: Input quality problems¶

Examples¶

Typical result¶

What helps¶

Failure group 2: Replay and screen-based attacks¶

Examples¶

Why these are tricky¶

What helps¶

Failure group 3: Injection and virtual camera attacks¶

Examples¶

Why these are serious¶

What helps¶

Failure group 4: AI-generated or manipulated content¶

Examples¶

Why these matter¶

What helps¶

Failure group 5: Threshold and policy mistakes¶

Examples¶

What helps¶

Failure group 6: Device and environment mismatch¶

What helps¶

Failure group 7: Integration mistakes¶

Examples¶

What helps¶

Failure group 8: Human and operational issues¶

Examples¶

What helps¶

Failure review template¶

Common examples by symptom¶

Final takeaway¶

Related docs¶

Read next¶