Research Plan - liyinnbw/deepfake-detector GitHub Wiki

Scope our problem

Which subproblem to focus on?

Digital Forensic / Media Forensic
|__ Video Fake
    |__ Audio Fake
    |__ Visual Fake
        |__ Video Face Fake
            |__ Deep Fake
                |__ By fake region:
                    |__ head
                    |__ face
                    |__ mouth
                    |__ ...
                |__ By fake methods:
                    |__ DFAE
                    |__ StyleGAN
                    |__ FSGAN
                    |__ ...
                |__ By fake quality:
                    |__ low effort low quality
                    |__ high effort high quality

Possible approaches

  • Single-frame Analysis
    • Facial Landmarks Analysis (might be most invariant to technology change)
    • Signal Analysis (vulnerable to video compression but worth trying)
    • Direct Deep Learning (vulnerable to perturbation attack)
  • Cross-frame Analysis
    • Usually means using recurrent neural net
  • Behaviour Analysis (highly customised to individual)
  • Statistical
  • Origin Tracing (not interesting)
  • Multi method

An ideal detector should

  • Generalize well to unseen datasets (not only unseen videos, but also unseen faces, different lightings, different face angles)
  • Has long-lasting effectiveness (focus on features that are hard to fake, even if technology improves)
  • Fast & Scalable (as an indicator, most participating teams in facebook's deepfake challenge can classify a video within a few seconds)

Potential Chanllenges

  • How to measure detector performance? What benchmarks to use? (facebook's paper has a discussion on this)?
  • Model generalization (For the facebook deepfake challenge, top team in public test dataset achieved 82.56% precision, but top team in private test dataset achieved only 65.18 precision, this means models does not perform well on unseen faces)
  • Multi-face issues (more than 1 face in the video, 1 face is faked, the other is not, etc.)
  • Need to experiment with data augmentation to improve model accuracy.
  • We might have to ignore fake audio, because the public datasets contain very little fake audio
  • It might be difficult to find long-lasting, hard-to-fake features.
  • Explainable AI.

Timeline

Deliverable