Written by Venkatesh Ramamrat
“Creativity is seeing what others see and thinking what no one else ever thought." - Albert Einstein
The AI painting ‘Edmond de Bellamy’. This file is in the public domain because, as the work of a computer algorithm or artificial intelligence, it has no human author, when I saw this I was compelled to understand how AI created art, which led me to a field of Generative Adversarial networks or GAN.
The principle behind the GAN was first proposed in 2014, and at its most basic level, it describes a system that pits two AI systems (neural networks) against each other to improve the quality of their results. The GAN architecture was first described in the 2014 paper by Ian Goodfellow, et al. titled “Generative Adversarial Networks".
As an artist, I've been pretty interested to understand this revolutionary technology that allows AI to create Art. To understand, let me give an art-related explanation for GAN. Let us assume a blind forger trying to create copies of paintings by great masters. To start with, he has no idea what a painting should look like – but he happens to have a friend who has a photographic memory of every masterpiece that's ever been painted.
This friend – a detective – has to determine whether the paintings his friend is showing match the features of those created by the real great masters, or are obvious forgeries.
This is the basic idea of how a GAN operates – only as they are AIs, both the forger and his friend can act at super speed, making and detecting thousands of forgeries per second. Both of them then "learn" from the outcome to improve their future performance. As the detective becomes better at detecting forgeries, the forger must become better at creating them.
Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained in photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics.
Though originally proposed as a form of a generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning and reinforcement learning. GANs have been the cause of a lot of excitement within the field of AI development in recent years, due to their ability to create “new” information following rules established by existing information. Several architectures are illustrated below:
Types of GAN architecture
To understand GAN better one could refer
Generative models such as GANs provide promising results in multiple domains including images, videos, audio, and texts. Video synthesis is still in the early stages compared to other domains such as images. The current state of the art for video GANs suffers from low-quality frames or a low number of frames or both.
Compared to image GANs, video GANs require different treatments because of the data complexity. A video consists of multiple images with an additional time dimension. Although the progress on GANs in areas other than videos is well documented through several review papers, video GANs models have received less attention so far, and if at all included, they were only a section in other review papers despite their broad range. Considering the increasing number of studies on video GANs during the past few years, it is the right time to survey the field, categorize different models according to their applications, and compare their differences.
Synthetic media (also known as AI-generated media and colloquially as deepfakes is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as to mislead people or change an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deep fakes as well as music synthesis, text generation, human image synthesis, and speech synthesis.
Before GAN, I had come across the Mandelbrot set and the simple equation, when plotted gives an infinite fractal visualization. The Mandelbrot Set, when plotted, gives rise to the most famous and beautiful patterns I have come across which are not truly out of nature but of a mathematical equation f(x) = x2 + c
“Training algorithms to generate art is, in some ways, the easy part. You feed them data, they look for patterns, and they do their best to replicate what they’ve seen. But like all automatons, AI systems are tireless and produce a never-ending stream of images. The tricky part is knowing what to do with it all.” - German AI artist Mario Klingemann
Right now, the GAN technology is limited to 2D content, hence why it might make the most sense to use it for skin texture generation, but in the not-so-distant future, these concepts will be applied to 3D data as well. Meta-Human combined with the power of GAN would truly be a game changer. Over time, the same principles could potentially be applied to body types, facial features, facial hair and hairstyles, and more of course (e.g non-humanoid creatures). But as a first step, using GAN techniques to generate unique skin textures would increase the probability that the characters that are created with the tool end up having a more unique look and feel, instead of relying on scanned data that only allows for a predetermined set of possible permutations
GameGAN, a generative adversarial network trained on 50,000 PAC-MAN episodes, produces a fully functional version of the dot-munching classic without an underlying game engine.
Game Changer: NVIDIA Researcher Seung-Wook Kim and his collaborators trained GameGAN on 50,000 episodes of PAC-MAN.
In 2018, GANs reached the video game modding community, as a method of up-scaling low-resolution 2D textures in old video games by recreating them in 4k or higher resolutions via image training, and then down-sampling them to fit the game's native resolution
Known examples of extensive GAN usage include
Interesting GAN applications:
Digital Parenting GAN
GAN is an intersection of Technology and Art, and the future implications, as the AI learns in time, the possibilities of what GAN can lead to, will be something I will be looking at. Policy Makers and Governments have to create a framework where deep fakes and the negative impact of GAN can be minimized. We at Wranga, have an immediate task of interacting with Technology companies, parents, schools, and policymakers, so we can create an environment where the children are given a safe and secure digital environment, in which they can be creative and explore freely, with an understanding and awareness of the harms of technology, hence use technology and not be used by it.
We at Wranga also seek to utilize GAN in reviewing content and for text-to-video applications such as video reviews. If you look at the kind of content being uploaded every day it's a colossal scale of work. To put the scale in perspective:
At Wranga, our goal is to be able to rate and review content, which is not possible to be done by only human intervention, hence we are utilizing our proprietary AI technology to be able to rate content, and envisage the use of GAN to be able to create reviews, As we will discuss further the ethics of GAN, AI, deep fakes we also realize that technology has the power to scale our work and be able to reach out with the guidance of content for parents before they show it to children. Since we understand that we cannot, at times, stop children from viewing harmful content, but with our GAN review video, if parents can get guidance on how to deal with a sensitive situation with children, that's a big win for us. To be able to create a difference, the Tech team at wranga is looking at how to incorporate GAN and be able to create a scale of review that can try to match the speed at which new content and videos are being added to the internet every day.