While trying to improve the quality and fidelity of AI-generated images, a group of researchers from China and Australia inadvertently discovered a method to interactively control the latent space of a generative antagonist network (GAN) – the mysterious computational matrix behind the new wave of image synthesis techniques that will revolutionize movies, games and social media, as well as many other areas of entertainment and research.
Their discovery, a by-product of the project’s central goal, allows a user to arbitrarily and interactively explore the latent space of a GAN with a mouse, as if browsing a video or leafing through a book. .
The method uses “heat maps” to indicate which areas of an image need to be enhanced as the GAN traverses the same data set thousands (or hundreds of thousands) of times. Heat maps are meant to improve image quality by telling GAN where things are wrong, so their next attempt is better; but, coincidentally, it also provides a “map” of all the latent space that can be traversed by moving a mouse.
The paper is called Improve the balance of the GAN by increasing spatial awareness, and comes from researchers at the Chinese University of Hong Kong and the Australian National University. In addition to the article, videos and other documents are available on the project page.
The work is nascent, and currently limited to low resolution imagery (256×256), but it is a proof of concept that promises to open the “black box” of latent space, and comes at a time when several projects research hammers this in the pursuit of greater control over image synthesis.
While such images are appealing (and you can see more, in better resolution, in the embedded video at the end of this article), perhaps more importantly, the project has found a way to create better image quality, and potentially to do it faster, by specifically telling GAN where things are wrong during training.
But as accusatory indicates that a GAN is not a single entity, but rather an unequal conflict between authority and drudgery. To understand what improvements researchers have made in this regard, let’s look at how this war has been characterized so far.
The pitiful misery of the generator
If you’ve ever been haunted by the idea that a new item of clothing you bought was made in a sweatshop in an exploited country, or a boss or customer kept telling you to “Redo ! Without ever telling you what was wrong with your last attempt, have a little pity on him. Generator part of a generative adversarial network.
Generator has been your workhorse for about five years, helping GANs create photorealistic people who don’t exist, bring old video games to 4k resolution, and turn century-old footage into HD output into color at 60fps, among other wonderful AI novelties.
The generator goes through all the training data over and over again (such as images of faces, to create a GAN that can create photos of random and non-existent people), one photo at a time, for days or even weeks, until he was able to create images as compelling as the authentic photos he studied.
So how does the Generator know that it is progressing, each time it tries to create a better image than its previous attempt?
The Generator has an infernal boss.
The ruthless opacity of the discriminator
The work of Discriminator is to tell the generator that it was not successful enough in creating an authentic image against the original data, and to Do it again. The Discriminator says nothing to the Generator What made a mistake during the last generator attempt; it simply takes a private look at it, compares the generated image to the source images (again, privately), and gives the image a score.
The score is never quite well. The discriminator keeps saying ‘Do it again’ until the researchers turn it off (when they judge that the additional training will not improve performance further).
In this way, in the absence of any constructive criticism, and armed only with a score whose metric is a mystery, the Generator must guess at random which parts or aspects of the image caused a higher score than before. . It will lead him down many other unsatisfactory paths before he changes anything in a positive enough way to get a higher score.
The discriminator as a tutor and mentor
The innovation brought by the new research is essentially that the Discriminator now indicates to the Generator which parts of the image were not satisfactory, so the builder can focus on those areas on its next iteration and not throw away the top rated sections. The nature of the relationship has changed from combative to collaborative.
To address the knowledge disparity between discriminator and generator, researchers needed a mechanism that could formulate discriminator information into a visual aid for the generator’s next attempt.
They used GradCAM, a neural network interpretation tool that some of the researchers in the new paper had previously worked on, and which had previously enabled improved generation of GAN-based faces in a 2019 project.
The new “balance” training method is called EqGAN. For maximum reproducibility, researchers incorporated existing techniques and methods into default settings, including the use of the StyleGan2 architecture.
GradCAM produces heat maps (see images above) that reflect the discriminator’s criticisms on the last iteration, and make them available to the generator.
Once the model is formed, the mapping remains an artifact of this cooperative process, but can also be used to explore the final latent code in the interactive way demonstrated in the researchers’ project video (see below).
The project used a number of popular datasets, including the LSUN Cat and Churches datasets, as well as the FFHQ dataset. The video below also shows examples of facial and feline manipulation using EqGAN.
All images were resized to 256×256 prior to training EqGAN on the official StyleGAN2 implementation. The model was trained at a batch size of 64 on 8 GPUs until the discriminator was exposed to over 25 million frames.
By testing the results of the system on selected samples with Frechet Inception Distance (FID), the authors established a metric called the imbalance indicator (DI) – the degree to which the discriminator retains its knowledge advantage over the generator, in the goal of reducing this gap. .
Of the three datasets formed, the new metric showed a useful drop after encoding spatial awareness in the generator, with better balance demonstrated by both FID and DI.
The researchers conclude:
“We hope that this work can inspire further work on revising the GAN balance and develop more innovative methods to improve the quality of image synthesis by maneuvering the balance of the GAN.” We will also conduct a more theoretical investigation of this issue in future work. ‘
And carry on:
“The qualitative results show that our method has succeeded [forces the Generator] focus on specific regions. Experiments on various datasets validate that our method alleviates the imbalance in GAN training and significantly improves the overall quality of image synthesis. The resulting model with spatial awareness also allows interactive manipulation of the output image. ‘
Take a look at the video below for more details on the project and other examples of dynamic and interactive exploration of latent space in a GAN.