Appendix A9 — Data Collection and Labeling¶
Purpose¶
This appendix describes practical guidance for collecting, labeling, and packaging liveness data so that training and evaluation stay useful and reproducible.
Data collection principles¶
A strong collection plan should define:
- target use case
- attack families to cover
- channels and platforms to cover
- privacy and consent rules
- storage and retention rules
- split and version strategy
Useful label fields¶
At minimum, try to store:
sample_idperson_idsession_idlabelattack_familyattack_typeplatformdevice_classcapture_typelighting_bucketquality_bucketreview_status
Labeling rules should be written down¶
A practical labeling guide should answer:
- what counts as bona fide
- what counts as spoof
- how to label uncertain or ambiguous samples
- how to label multi-problem cases such as spoof plus poor quality
- how to escalate disagreements between reviewers
Suggested review states¶
| State | Meaning |
|---|---|
| confirmed | label agreed and accepted |
| uncertain | label not reliable enough yet |
| disputed | reviewer disagreement exists |
| excluded | should not be used for training or main evaluation |
Reviewer agreement matters¶
Useful practices:
- sample a subset for double review
- track disagreement rate
- define escalation path for hard cases
- store review notes for repeated edge cases
This reduces silent label noise.
Naming and packaging hygiene¶
Use consistent identifiers and keep metadata outside filenames when possible.
A simple pattern is:
This makes joins and audits easier.
Privacy and consent reminders¶
Data collection should respect the program's privacy and legal obligations.
Typical considerations include:
- user consent where required
- storage minimization
- retention policy
- restricted access to raw media
- audit trail for access
Common mistakes¶
| Mistake | Why it hurts |
|---|---|
| only binary labels | blocks deeper error analysis |
| no uncertain bucket | noisy labels pollute training |
| no reviewer audit trail | hard to improve label quality |
| no device metadata | segmentation becomes weak |
| changing label schema without versioning | experiments become hard to compare |