Research Plan - liyinnbw/deepfake-detector GitHub Wiki
Scope our problem
Which subproblem to focus on?
Digital Forensic / Media Forensic
|__ Video Fake
|__ Audio Fake
|__ Visual Fake
|__ Video Face Fake
|__ Deep Fake
|__ By fake region:
|__ head
|__ face
|__ mouth
|__ ...
|__ By fake methods:
|__ DFAE
|__ StyleGAN
|__ FSGAN
|__ ...
|__ By fake quality:
|__ low effort low quality
|__ high effort high quality
Possible approaches
- Single-frame Analysis
- Facial Landmarks Analysis (might be most invariant to technology change)
- Signal Analysis (vulnerable to video compression but worth trying)
- Direct Deep Learning (vulnerable to perturbation attack)
- Cross-frame Analysis
- Usually means using recurrent neural net
- Behaviour Analysis (highly customised to individual)
- Statistical
- Origin Tracing (not interesting)
- Multi method
An ideal detector should
- Generalize well to unseen datasets (not only unseen videos, but also unseen faces, different lightings, different face angles)
- Has long-lasting effectiveness (focus on features that are hard to fake, even if technology improves)
- Fast & Scalable (as an indicator, most participating teams in facebook's deepfake challenge can classify a video within a few seconds)
Potential Chanllenges
- How to measure detector performance? What benchmarks to use? (facebook's paper has a discussion on this)?
- Model generalization (For the facebook deepfake challenge, top team in public test dataset achieved 82.56% precision, but top team in private test dataset achieved only 65.18 precision, this means models does not perform well on unseen faces)
- Multi-face issues (more than 1 face in the video, 1 face is faked, the other is not, etc.)
- Need to experiment with data augmentation to improve model accuracy.
- We might have to ignore fake audio, because the public datasets contain very little fake audio
- It might be difficult to find long-lasting, hard-to-fake features.
- Explainable AI.