Appendix A3 — Metrics and Evaluation¶

Purpose¶

This appendix gives a practical overview of how to evaluate face liveness systems.

Important metrics¶

APCER¶

How often attack presentations are incorrectly accepted.

BPCER¶

How often genuine users are incorrectly rejected.

ACER¶

A simple average of APCER and BPCER. Useful as a summary, but not sufficient by itself.

Latency¶

How long the user waits for the result. Real systems should track more than average latency; percentiles matter.

Retry rate¶

How often the system asks users to try again.

Completion rate¶

How many genuine users successfully finish the process.

Why one number is not enough¶

A model can look strong on one summary metric and still fail in production because of:

hard device segments
weak lighting
browser-specific issues
injection attack gaps
retry policy problems

Evaluation should always include segmented analysis.

Good evaluation design¶

Test across: - device classes - front camera quality levels - operating systems and browsers - lighting conditions - indoor and outdoor scenes - glasses / occlusion conditions - different attack instruments - network variability where relevant

Threshold tuning¶

Thresholds should be tuned using realistic operational goals, not just lab accuracy.

A good threshold policy usually defines: - pass zone - uncertain zone - fail zone

The uncertain zone supports safer retry or escalation logic.

Production monitoring metrics¶

After launch, keep tracking: - pass/retry/fail rate - score distribution shifts - latency percentiles - device-wise behavior - region-wise anomalies - manual review escalation rate - confirmed fraud outcomes where available

Appendix A3 — Metrics and Evaluation¶

Purpose¶

Important metrics¶

APCER¶

BPCER¶

ACER¶

Latency¶

Retry rate¶

Completion rate¶

Why one number is not enough¶

Good evaluation design¶

Threshold tuning¶

Production monitoring metrics¶

Main-guide links¶

Read next¶

Appendix A3 — Metrics and Evaluation¶

Purpose¶

Important metrics¶

APCER¶

BPCER¶

ACER¶

Latency¶

Retry rate¶

Completion rate¶

Why one number is not enough¶

Good evaluation design¶

Threshold tuning¶

Production monitoring metrics¶

Related detailed pages in this repo¶

Main-guide links¶

Read next¶