Real or Reel? The Bane of Deep Learning!
DeepFakes and Beyond — A survey of Face Manipulation and Fake Detection
Seen a politician say things they couldn’t have ever said or your favorite film actors in pornographic movies? Well then, you have seen a DeepFake. Call it an advanced 21st century Photoshop!
Easy access of large-scale public databases along with the progression of deep learning techniques has allowed even amateurs in the field go down the line for creating fake images and videos. This has led to extreme proliferation of realistic fake content and chaos in the society, which brings us to the concept of ‘Deepfake’.
The term ‘DeepFake’ is a deep learning-based technique used to create fake images/videos by swapping the face of a person with the face of another person. Some of its harmful usages include pornography, fake news, politics, hoaxes and financial fraud.
Traditionally, fake content detection methods hugely relied on training scenarios and did not serve accurate results under unobserved conditions. However, it did not matter much because fake images created back then were not as realistic as it is now, thus it was easy to identify a fake.
However, in today’s world where fake images look too real, the robustness of fake detection techniques is of utmost importance. We are yet to reach a stage where fake detection is 100% foolproof. This is because fake images, when uploaded to social media & websites are subject to compression, noise, resizing and other changes. This makes it difficult for detection algorithms to work, since they are trained only for specific scenarios.
This article provides a complete overview of different facial manipulations and techniques to detect them.
Facial Manipulation Techniques
The four main types of facial manipulation techniques are described as follows:
1. Entire face Synthesis: This involves creating a brand-new face from scratch using powerful GANs (StyleGAN). With the help of this method, excellent quality faces with a high degree of realism are obtained.
Video game and 3D modeling industries hugely benefit from it. However, harmful applications include creation of highly realistic social media profiles for misuse.
2. Identity Swap: As the name suggests, this technique involves swapping the face of one person with another using approaches such as FaceSwap and DeepFakes.
This facial manipulation technique benefits the film industry. However, from porn to politics, there are many negative applications of this technique. Some of them include celebrity pornography, hoaxes, financial fraud, political misuse, among others.
3. Attribute Manipulation: This type of manipulation, also known as Face editing or Face retouching involves modifying facial attributes. For instance, modifying color of hair or skin, gender, age etc. This is usually achieved using GANs such as StarGAN. A common example of this method is the FaceApp mobile app.
4. Expression Swap: Yet another common form of facial manipulation with dangerous consequences, involves modifying the facial expressions of a person. This is also known as Face Re-enactment model. Seen the popular video of Mark Zuckerberg saying “total control of billions of people’s stolen data”, that he never said? Such are the dangerous consequences of Expression Swap.
Detecting facial manipulation
Now that we’ve seen the different methods of facial manipulation, let us next look into ways of detecting them.
1. Entire face Synthesis
As we saw earlier, Entire Face Synthesis uses GAN architectures to create fake images which can be identified using the GAN fingerprint it leaves behind. In order to innovate techniques to detect this, researchers need samples from both real and fake images.
The fake images were obtained from — 100K-Generated-Images (2019), 100K-Faces (2019), DFFD (2020) and iFakeFaceDB (2020) and real images from public databases such as CelebA, FFHQ, CASIAWebFace, and VGGFace2, among others.
Manipulation Detection Techniques:
Some techniques analyzed the internal GAN pipeline in order to detect the differences between real and fake images while some others prioritized the color differences.
Another interesting approach was called FakeSpoter which captured the neuron behavior. The neuron activation patterns between different layers help record hidden features important for manipulation detection. For classification, the researchers mostly used KNN, SVM and LDA.
Attention methods were also employed to further up the detection process. The best results were obtained by using CNN with attention mechanisms. The researchers used CNN models like XceptionNet and VGG 16 achieving a 100% AUC and 0.1% EER.
A point worth noting is that the iFakeFaceDB database is an updated version of previous fake image databases in which GAN fingerprint has been removed using an approach called GANprintR making this a hard problem to solve. The best achieved EER was 4.5% even using the most remarkable fake detectors.
2. Identity Swap
This is one of the most popular areas of face manipulation detection given the effects of DeepFakes in society. The goal in Identity Swap is to generate realistic fake videos contrary to the Entire Face Synthesis which is performed at an image level.
The databases for Identity Swap detection have been categorized into 2 generations. The first generation database includes — UADFV (2018), DeepfakeTIMIT (2018), FaceForensics++ (2019), and the second generation database includes DeepFakeDetection (2019), Celeb-DF (2019), DFDC Preview (2019)
The key takeaways for differences between the first- and second-generation databases —
The fake content/videos from the first generation databases have low quality images, color contrast between the synthetic face mask and the skin color, visible boundaries of the fake mask, visible elements from the original video, among others. They also have limited scenarios with respect to camera position and light conditions.Most of these flaws have been fixed in the videos of second generation databases making them more challenging to detect.
Manipulation Detection Techniques:
Approaches used to detect fake videos include –
— Recording inconsistencies between lip movement and audio, Mel-Frequency Cepstral Coefficients (MFCCs) were taken as audio features and distance between the lip landmark for the visual features. PCA for dimensionality reduction and RNNs based on LSTM were employed to detect fake videos.
— Simple visual features such as eye color, missing reflections, missing elements between eye and teeth areas were used with Logistic Regression and MLP.
— Some other approaches employed features such as facial expressions and head movement, Deep-Vision considered eye blinking patterns between the real and fake images.
— Another approach worth mentioning is based on mesoscopic and steganalysis features which achieved an accuracy of 98.4% and was considered for the best performance. It was also tested against unseen data and proved to be robust.
More approaches are explained in the research paper.
3. Attribute manipulation
We’ve learnt that this manipulation includes modifying attributes of the face. Now, let us look into how we can detect them.
Since the code for GAN approaches required to create attribute manipulated images are openly available, the researchers can employ these techniques to create fake database on a need basis. And as for the readily available database, DFFD is the only known one for this detection.
Manipulation detection techniques:
Most of the approaches are similar to the ones stated in the entire face synthesis section. Additionally, many deep learning methods were tested that achieved good results.
The best performing approach used attention mechanisms to improve feature maps of CNN models. Here, 2 data sources for fake images were considered — one from images from FaceApp software and the other from starGAN. When tested against DFFD database, it achieved 99.9% AUC and 1.0% EER.
4. Expression Swap
Reviewing what we saw above, this technique modifies the facial expression of the person. Here, we will study Face2Face and NeuralTextures which swaps the expressions of 2 people in different videos for instance.
The only known database for fake content under this technique is FaceForensics++.
Manipulation detection techniques:
Intuitively this technique sounds very similar to the identity swap. Hence, most of the detection techniques are very similar to what we’ve discussed for identity swap.
This article highlights the importance of attention mechanisms to further train neural networks for identifying identity and expression swap manipulations.
Other less common face manipulation detection techniques
The first half of this article explains about the four main categories of facial manipulation, however there are other less common, but more dangerous methods like face morphing, face de-identification and face synthesis based on audio or text. Let us look into them one by one.
1. Face Morphing
A type of facial manipulation system which can create an artificial biometric face sample resembling biometric information of two or more individuals. This is expected to seriously threaten facial recognition systems as the new manipulated morphed image will resemble the facial samples of two or more individuals.
Although face morphing detection research is faced with a number of challenges and still in infancy, a “Morphing Attack Detection” research paper has provided some hope. He provides an interesting framework, a public database, platform for evaluation and benchmarking.
2. Face De-Identification
This method removes all identity information from an image or video, thus protecting the privacy of the person in context. This is achieved in several ways. The first method is by blurring or pixelating the image. And other methods involve adding different identities to face images by maintaining all other factors like pose, expression, and illumination unchanged.
3. Face Synthesis (Audio to Video & Text to Video)
This type of video manipulation is also known as lip-sync deep fakes. Obama’s famous video is an accurate example of Audio to Video Face Synthesis. The approach uses several hours of old videos along with a new audio recording as input and a recurrent neural network (LSTM in this case). Using which, mapping from raw audio features to mouth shapes are learned. Then, a new video is created based on mouth shape at each frame, mouth texture and 3D pose matching.
For the second type, that is synthesis of fake videos from text, a video of a person speaking and text to be spoken are taken as input. Then a new video is synthesized with the persons mouth synchronizing with new words.
However, as of now, there are no publicly available databases and benchmarks for detection of audio or text to video facial manipulation techniques.
We’ve seen the different types of facial manipulation methods and detection of such manipulations in this article. However, it is important to remark that these detection methods work under controlled circumstances only. That is, when fake detectors are evaluated in the same circumstances they are trained for and not in other circumstances.
Take for instance, fake videos and images that are shared across social networks and other websites, where they are subject to resizing, noise and compression changes. Under these situations, fake detectors may not be as effective. Hence, other methods like Fusion techniques (at a score or feature level) could provide fake detectors a better adaptation to different scenarios.
On a closing note, it is important for each of us to familiarize ourselves with the DeepFake techniques and the existence of fake content itself so as to do whatever little we can to prevent it from circulating and causing harm to the society.
Professor Vijay Eranti — Thank you for your guidance.