Appendix A9 — Data Collection and Labeling¶

Purpose¶

This appendix describes practical guidance for collecting, labeling, and packaging liveness data so that training and evaluation stay useful and reproducible.

Data collection principles¶

A strong collection plan should define:

target use case
attack families to cover
channels and platforms to cover
privacy and consent rules
storage and retention rules
split and version strategy

Useful label fields¶

At minimum, try to store:

sample_id
person_id
session_id
label
attack_family
attack_type
platform
device_class
capture_type
lighting_bucket
quality_bucket
review_status

Labeling rules should be written down¶

A practical labeling guide should answer:

what counts as bona fide
what counts as spoof
how to label uncertain or ambiguous samples
how to label multi-problem cases such as spoof plus poor quality
how to escalate disagreements between reviewers

Suggested review states¶

State	Meaning
confirmed	label agreed and accepted
uncertain	label not reliable enough yet
disputed	reviewer disagreement exists
excluded	should not be used for training or main evaluation

Reviewer agreement matters¶

Useful practices:

sample a subset for double review
track disagreement rate
define escalation path for hard cases
store review notes for repeated edge cases

This reduces silent label noise.

Naming and packaging hygiene¶

Use consistent identifiers and keep metadata outside filenames when possible.

A simple pattern is:

sample_id, person_id, session_id, label, attack_type, platform, split

This makes joins and audits easier.

Data collection should respect the program's privacy and legal obligations.

Typical considerations include:

user consent where required
storage minimization
retention policy
restricted access to raw media
audit trail for access

Common mistakes¶

Mistake	Why it hurts
only binary labels	blocks deeper error analysis
no uncertain bucket	noisy labels pollute training
no reviewer audit trail	hard to improve label quality
no device metadata	segmentation becomes weak
changing label schema without versioning	experiments become hard to compare