Introduction To Generative Adversarial Networks: Gan Is Going To Be A Building Block For Web3

Published by Wranga | October 25, 2022
Generative Adversarial Networks



Written by Venkatesh Ramamrat
“Creativity is seeing what others see and thinking what no one else ever thought." - Albert Einstein

Generative Adversarial Networks

The AI painting ‘Edmond de Bellamy’. This file is in the public domain because, as the work of a computer algorithm or artificial intelligence, it has no human author, when I saw this I was compelled to understand how AI created art, which led me to a field of Generative Adversarial networks or GAN.

The principle behind the GAN was first proposed in 2014, and at its most basic level, it describes a system that pits two AI systems (neural networks) against each other to improve the quality of their results. The GAN architecture was first described in the 2014 paper by Ian Goodfellow, et al. titled “Generative Adversarial Networks".

As an artist, I've been pretty interested to understand this revolutionary technology that allows AI to create Art. To understand, let me give an art-related explanation for GAN. Let us assume a blind forger trying to create copies of paintings by great masters. To start with, he has no idea what a painting should look like – but he happens to have a friend who has a photographic memory of every masterpiece that's ever been painted.

This friend – a detective – has to determine whether the paintings his friend is showing match the features of those created by the real great masters, or are obvious forgeries.

This is the basic idea of how a GAN operates – only as they are AIs, both the forger and his friend can act at super speed, making and detecting thousands of forgeries per second. Both of them then "learn" from the outcome to improve their future performance. As the detective becomes better at detecting forgeries, the forger must become better at creating them.

Generative Adversarial Networks

Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained in photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics.

Though originally proposed as a form of a generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning and reinforcement learning. GANs have been the cause of a lot of excitement within the field of AI development in recent years, due to their ability to create “new” information following rules established by existing information. Several architectures are illustrated below:

Types of GAN architecture Generative Adversarial Networks

To understand GAN better one could refer

Video GAN
      Unconditional Video Generation
      Conditional Video Generation

Generative models such as GANs provide promising results in multiple domains including images, videos, audio, and texts. Video synthesis is still in the early stages compared to other domains such as images. The current state of the art for video GANs suffers from low-quality frames or a low number of frames or both.

Compared to image GANs, video GANs require different treatments because of the data complexity. A video consists of multiple images with an additional time dimension. Although the progress on GANs in areas other than videos is well documented through several review papers, video GANs models have received less attention so far, and if at all included, they were only a section in other review papers despite their broad range. Considering the increasing number of studies on video GANs during the past few years, it is the right time to survey the field, categorize different models according to their applications, and compare their differences.

Synthetic Media

Synthetic media (also known as AI-generated media and colloquially as deepfakes is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as to mislead people or change an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deep fakes as well as music synthesis, text generation, human image synthesis, and speech synthesis.

GAN Art

Before GAN, I had come across the Mandelbrot set and the simple equation, when plotted gives an infinite fractal visualization. The Mandelbrot Set, when plotted, gives rise to the most famous and beautiful patterns I have come across which are not truly out of nature but of a mathematical equation f(x) = x2 + c

Generative Adversarial Networks

“Training algorithms to generate art is, in some ways, the easy part. You feed them data, they look for patterns, and they do their best to replicate what they’ve seen. But like all automatons, AI systems are tireless and produce a never-ending stream of images. The tricky part is knowing what to do with it all.” - German AI artist Mario Klingemann

Generative Adversarial Networks
Robbie Barrat: AI-Generated Nude Portrait #1, 2018, digital image made with GAN

Generative Adversarial Networks
Still from Holly Herndon and Mat Dryhurst’s Crossing the Interface (DAO) I, AI-generated animation with text by Reza Negarestani, 15 seconds.

Generative Adversarial Networks
Mike Tyka: from the series “Portraits of Imaginary People,” 2017, from left to right: hamidmansoor123, JoshuaSpence88, and chalizzTrt, AI-generated images.

Games

Right now, the GAN technology is limited to 2D content, hence why it might make the most sense to use it for skin texture generation, but in the not-so-distant future, these concepts will be applied to 3D data as well. Meta-Human combined with the power of GAN would truly be a game changer. Over time, the same principles could potentially be applied to body types, facial features, facial hair and hairstyles, and more of course (e.g non-humanoid creatures). But as a first step, using GAN techniques to generate unique skin textures would increase the probability that the characters that are created with the tool end up having a more unique look and feel, instead of relying on scanned data that only allows for a predetermined set of possible permutations

GameGAN, a generative adversarial network trained on 50,000 PAC-MAN episodes, produces a fully functional version of the dot-munching classic without an underlying game engine.

Game Changer: NVIDIA Researcher Seung-Wook Kim and his collaborators trained GameGAN on 50,000 episodes of PAC-MAN.

In 2018, GANs reached the video game modding community, as a method of up-scaling low-resolution 2D textures in old video games by recreating them in 4k or higher resolutions via image training, and then down-sampling them to fit the game's native resolution

Known examples of extensive GAN usage include

      Final Fantasy VIII
      Final Fantasy IX
      Resident Evil Remake HD Remaster
      Max Payne

Interesting GAN applications:

      Self Driving GAN
      Music GAN
      Sound Synthesis GAN
      Medical GAN
      Safety and Cybersecurity

Digital Parenting GAN GAN is an intersection of Technology and Art, and the future implications, as the AI learns in time, the possibilities of what GAN can lead to, will be something I will be looking at. Policy Makers and Governments have to create a framework where deep fakes and the negative impact of GAN can be minimized. We at Wranga, have an immediate task of interacting with Technology companies, parents, schools, and policymakers, so we can create an environment where the children are given a safe and secure digital environment, in which they can be creative and explore freely, with an understanding and awareness of the harms of technology, hence use technology and not be used by it.

We at Wranga also seek to utilize GAN in reviewing content and for text-to-video applications such as video reviews. If you look at the kind of content being uploaded every day it's a colossal scale of work. To put the scale in perspective:

      Around 3.7m new videos are uploaded to YouTube every day – that's around 271,330 hours of video content based on an average length of 4.4 minutes.
      More than 300 million photos get uploaded per day. Every minute there are 510,000 comments posted and 293,000 statuses updated.
      Indians played games for 63 billion minutes, against 42 billion minutes in March 2019, recording a 49 percent increase.
      Indians spent 188 billion minutes on various OTT platforms -- the highest 69 billion minutes on daily soaps followed by movies with 31 billion minutes -- in the month of February.

At Wranga, our goal is to be able to rate and review content, which is not possible to be done by only human intervention, hence we are utilizing our proprietary AI technology to be able to rate content, and envisage the use of GAN to be able to create reviews, As we will discuss further the ethics of GAN, AI, deep fakes we also realize that technology has the power to scale our work and be able to reach out with the guidance of content for parents before they show it to children. Since we understand that we cannot, at times, stop children from viewing harmful content, but with our GAN review video, if parents can get guidance on how to deal with a sensitive situation with children, that's a big win for us. To be able to create a difference, the Tech team at wranga is looking at how to incorporate GAN and be able to create a scale of review that can try to match the speed at which new content and videos are being added to the internet every day.