The New Creative Machine-Learning World of GANs

GANs, BiGANs and BigGANs and BigBiGANs, oh my!

Posted Jul 10, 2019

ivanovgood/Pixabay
Source: ivanovgood/Pixabay

The capabilities of artificial intelligence (AI) are growing exponentially, especially in the area of creating synthetic images that are photorealistic. In 2014, generative adversarial networks (GANs) were introduced. A few years later, bidirectional GANs (BiGANs) were created. Then came along BigGANs that outperformed state-of-the-art GANs in image synthesis. But wait, there's more: Last week researchers from Alphabet Inc.’s DeepMind debuted BigBiGANs. Here is a gander at the big, big AI machine learning world of GANs, BiGANs, BigGANs and BigBiGANs.

What are GANs?

GANs are a recent innovation in the modern history of artificial intelligence. GAN is an acronym for generative adversarial network—a type of AI neural network architecture used for training for AI deep learning that was introduced in 2014 at the Neural Information Processing Systems conference by Ian Goodfellow, along with Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and CIFAR Senior Fellow Yoshua Bengio.

Goodfellow and his team made AI history with their proposed new machine learning framework, the generative adversarial network, which consists of two artificial neural networks (ANNs) that compete, thereby simultaneously training one another. The generative network creates synthetic samples, the other is a discriminative network that tries to detect whether samples are created or from actual data.

What are Convolutional and Deconvolutional Neural Networks?

Often a Convolutional Neural Network (CNN) for the discriminative neural network, and a Deconvolutional Neural Network (DNN) is used for the generative network.  A CNN is a type of deep neural network that is somewhat inspired by the biological brain’s visual cortex. A deconvolutional neural network is a CNN that operates in reverse.

What are Artificial Neural Networks?

Artificial Neural Networks (ANNs) are interconnected computation layers with connected artificial neurons, nodes, with associated weights that may be adjusted during the learning process. At a minimum, there are three layers in an ANN—input, processing, and output layer. The more layers in-between, the deeper the neural network.

The conceptual architecture for Artificial Neural Networks is a bit analogous to the neurons of the biological brain where information may be passed between nodes. ANNs are nonlinear statistical data modeling tools used in to model complex relationships and discover patterns for real-world use in computer vision, machine translation, game playing, speech recognition, and more purposes. Because ANNs can arrive at solutions from samples, rather than massive, full datasets, it is considered relatively computationally efficient.

How do GANs work?

The training goal of the generative network is to create samples that its opponent, the discriminative network, thinks is from the actual data distribution. For example, imagine a new type of reality TV game show called VeGAN, where a vegan chef (generative network) tries to fool a food taster (discriminative network) with chef-generated plant-based samples such as vegan bratwurst, soy hotdogs, and meat-free burgers made from pea-protein and beet-juice extract, made to resemble real meat dishes (actual data distribution).

The taster (discriminative network) is trained with samples from a training dataset until it reaches a level of desired accuracy. The goal of the taster is to accurately discriminate which dishes are actual meat versus vegan dishes. The food taster samples dishes with real meat (actual data distribution), as well as generated faux-meat dishes produced by the vegan chef.

The taster is given a sample dish, and it produces a scalar on whether the sample appears to be vegan or not. The ultimate goal of the chef (generative network) is to synthesize food samples in a manner that tricks the taster—to drive up the tasting error rate. The chef learns which dishes fool the taster, and applies it towards improving dishes for future rounds of play.

Backpropagation (backward propagation) is applied to both dueling neural networks so that the taster is able to discriminate with better accuracy, and the chef generates vegan dishes that are more meat-like.

What is Backpropagation?

Backpropagation is a relatively efficient algorithmic technique used in AI deep learning to train deep neural networks. When an Artificial Neural Network encounters an error, the gradient of the error function is calculated with changes based on weights in a backward direction—starting with the last neural network layer, and ending with the first neural network layer. The calculations are performed in a manner where the semi-complete computations of one layer are used for gradient calculations in the prior neural network layer.

Using the same analogy, the game continues on multiple rounds (iterations), where both the vegan chef and the taster improve their abilities, thereby learning from dueling.

What are Bidirectional GANs (BiGANs)?

Researchers Jeff Donahue and Trevor Darrell of the University of California at Berkeley, and Philipp Krähenbühl of the University of Texas at Austin introduced a new unsupervised feature learning framework called the Bidirectional Generative Adversarial Network (BiGAN) at an annual machine learning conference called the International Conference on Learning Representations (ICLR) in 2017. BiGANs extract feature representations of data that GANs alone are not able to do.  In addition to a generative network and discriminative network, BiGANs have an encoder that is able to learn inverse mapping. In BiGANs, the discriminative network is given the added objective of classifying encoding from the encoder versus synthetically generated encoding.

What is a BigGAN?

Now that we have an understanding of the fundamentals of GANs and BiGANs, what is a BigGAN? In simple terms, a BigGAN is a large GAN with additional bells and whistles to make it outperform ordinary GANs by a huge margin.

Andrew Brock, Jeff Donahue, and Karen Simonyan published as a conference paper introducing BigGAN at ICLR in February 2019 based on the paper titled “Large Scale GAN Training for High Fidelity Natural Image Synthesis” first submitted in September 2018 on arXiv. BigGAN is an algorithm that is able to perform large scale GAN training that results in high fidelity natural image synthesis that exceeds the performance of current solutions—it produces realistic images.

To create BigGAN, the researchers increased the batch size by eight times, models were trained with twice to four times as many parameters, and a “truncation trick” was used to enable control of the tradeoff between sample fidelity and variety.

BigGAN had better image quality and diversity that outperforms existing GANs. At a 128 x 128 resolution training on ImageNet, BigGAN’s Inception Score (IS) was better than the existing best IS of 52.2 by over three-times with an astounding IS of 166.6, and a Fréchet Inception Distance (FID) value beating the existing FID record of 18.65 with a coveted lower value FID of 7.4.

BigGAN are general adversarial networks trained “at the largest scale yet attempted” with modifications to produce “the new state of the art in class-conditional image synthesis” according to the researchers.

What is a BigBiGAN?

What do you get when a BiGAN is combined with a BigGAN generator? Why you get a BigBiGAN, naturally. On July 4, 2019, Jeff Donahue and Karen Simonyan of Alphabet Inc.’s DeepMind introduced BigBiGAN in a paper submitted on arXiv that takes BiGANs and BigGANs to the next level.

“Our approach, BigBiGAN, builds upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator,” wrote the DeepMind researchers. “We extensively evaluate the representation learning and generation capabilities of these BigBiGAN models, demonstrating that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation.”

Why do all these flavors of GANs matter?

Generative models such as GANs, BiGANs, BigGANs, and BigBiGANs enable machines to produce and simulate their own novel images or concepts—a de facto form of artificial imagination. By applying cross-disciplinary fields of mathematics, data science, information technology, computer science and statistics, researchers have endowed machines with the ability to create—representing a milestone in innovation by humankind, and a step forward toward achieving artificial general intelligence and technological singularity in the future.

Copyright © 2019 Cami Rosso All rights reserved.