The Unexpected Advantage of GAN Latent Space Mapping

While trying to improve the quality and fidelity of AI-generated images, a group of researchers from China and Australia inadvertently discovered a method to interactively control the latent space of a generative antagonist network (GAN) – the mysterious computational matrix behind the new wave of image synthesis techniques that will revolutionize movies, games and social media, as well as many other areas of entertainment and research.

Their discovery, a by-product of the project’s central goal, allows a user to arbitrarily and interactively explore the latent space of a GAN with a mouse, as if browsing a video or leafing through a book. .

An excerpt from the accompanying video of the researchers (see embedding at the end of the article for many other examples). Note that the user manipulates the transformations with a “grab” cursor (top left). Source: https://www.youtube.com/watch?v=k7sG4XY5rIc

The method uses “heat maps” to indicate which areas of an image need to be enhanced as the GAN traverses the same data set thousands (or hundreds of thousands) of times. Heat maps are meant to improve image quality by telling GAN where things are wrong, so their next attempt is better; but, coincidentally, it also provides a “map” of all the latent space that can be traversed by moving a mouse.

Spatial visual attention accentuated via GradCAM, which indicates areas that need attention by imposing bright colors.  These samples are generated in the researchers' project with a default implementation of StyleGan2.  Source: https://arxiv.org/pdf/2112.00718.pdf

Spatial visual attention accentuated via GradCAM, which indicates areas that need attention by imposing bright colors. Source: https://arxiv.org/pdf/2112.00718.pdf

The paper is called Improve the balance of the GAN by increasing spatial awareness, and comes from researchers at the Chinese University of Hong Kong and the Australian National University. In addition to the article, videos and other documents are available on the project page.

The work is nascent, and currently limited to low resolution imagery (256×256), but it is a proof of concept that promises to open the “black box” of latent space, and comes at a time when several projects research hammers this in the pursuit of greater control over image synthesis.

While such images are appealing (and you can see more, in better resolution, in the embedded video at the end of this article), perhaps more importantly, the project has found a way to create better image quality, and potentially to do it faster, by specifically telling GAN where things are wrong during training.

But as accusatory indicates that a GAN is not a single entity, but rather an unequal conflict between authority and drudgery. To understand what improvements researchers have made in this regard, let’s look at how this war has been characterized so far.

The pitiful misery of the generator

If you’ve ever been haunted by the idea that a new item of clothing you bought was made in a sweatshop in an exploited country, or a boss or customer kept telling you to “Redo ! Without ever telling you what was wrong with your last attempt, have a little pity on him. Generator part of a generative adversarial network.

Generator has been your workhorse for about five years, helping GANs create photorealistic people who don’t exist, bring old video games to 4k resolution, and turn century-old footage into HD output into color at 60fps, among other wonderful AI novelties.

From creating photorealistic faces of unreal people to restoring ancient images and revitalizing archival video games, GAN has been very busy for the past few years.

From creating photorealistic faces of unreal people to restoring ancient images and revitalizing archival video games, GAN has been very busy for the past few years.

The generator goes through all the training data over and over again (such as images of faces, to create a GAN that can create photos of random and non-existent people), one photo at a time, for days or even weeks, until he was able to create images as compelling as the authentic photos he studied.

So how does the Generator know that it is progressing, each time it tries to create a better image than its previous attempt?

The Generator has an infernal boss.

The ruthless opacity of the discriminator

The work of Discriminator is to tell the generator that it was not successful enough in creating an authentic image against the original data, and to Do it again. The Discriminator says nothing to the Generator What made a mistake during the last generator attempt; it simply takes a private look at it, compares the generated image to the source images (again, privately), and gives the image a score.

The score is never quite well. The discriminator keeps saying ‘Do it again’ until the researchers turn it off (when they judge that the additional training will not improve performance further).

In this way, in the absence of any constructive criticism, and armed only with a score whose metric is a mystery, the Generator must guess at random which parts or aspects of the image caused a higher score than before. . It will lead him down many other unsatisfactory paths before he changes anything in a positive enough way to get a higher score.

The discriminator as a tutor and mentor

The innovation brought by the new research is essentially that the Discriminator now indicates to the Generator which parts of the image were not satisfactory, so the builder can focus on those areas on its next iteration and not throw away the top rated sections. The nature of the relationship has changed from combative to collaborative.

To address the knowledge disparity between discriminator and generator, researchers needed a mechanism that could formulate discriminator information into a visual aid for the generator’s next attempt.

They used GradCAM, a neural network interpretation tool that some of the researchers in the new paper had previously worked on, and which had previously enabled improved generation of GAN-based faces in a 2019 project.

The new “balance” training method is called EqGAN. For maximum reproducibility, researchers incorporated existing techniques and methods into default settings, including the use of the StyleGan2 architecture.

The architecture of EqGAN.  The spatial encoding of the generator is aligned with the spatial awareness of the discriminator, with random samples of spatial heat maps (see previous image) encoded in the generator via the spatial encoding layer (SEL).  GradCAM is the mechanism by which discriminator attention maps are made available to the generator.

The architecture of EqGAN. The spatial encoding of the generator is aligned with the spatial awareness of the discriminator, with random samples of spatial heat maps (see previous image) encoded in the generator via the spatial encoding layer (SEL). GradCAM is the mechanism by which discriminator attention maps are made available to the generator.

GradCAM produces heat maps (see images above) that reflect the discriminator’s criticisms on the last iteration, and make them available to the generator.

Once the model is formed, the mapping remains an artifact of this cooperative process, but can also be used to explore the final latent code in the interactive way demonstrated in the researchers’ project video (see below).

EqGAN

The project used a number of popular datasets, including the LSUN Cat and Churches datasets, as well as the FFHQ dataset. The video below also shows examples of facial and feline manipulation using EqGAN.

All images were resized to 256×256 prior to training EqGAN on the official StyleGAN2 implementation. The model was trained at a batch size of 64 on 8 GPUs until the discriminator was exposed to over 25 million frames.

By testing the results of the system on selected samples with Frechet Inception Distance (FID), the authors established a metric called the imbalance indicator (DI) – the degree to which the discriminator retains its knowledge advantage over the generator, in the goal of reducing this gap. .

Of the three datasets formed, the new metric showed a useful drop after encoding spatial awareness in the generator, with better balance demonstrated by both FID and DI.

The researchers conclude:

“We hope that this work can inspire further work on revising the GAN balance and develop more innovative methods to improve the quality of image synthesis by maneuvering the balance of the GAN.” We will also conduct a more theoretical investigation of this issue in future work. ‘

And carry on:

“The qualitative results show that our method has succeeded [forces the Generator] focus on specific regions. Experiments on various datasets validate that our method alleviates the imbalance in GAN training and significantly improves the overall quality of image synthesis. The resulting model with spatial awareness also allows interactive manipulation of the output image. ‘

Take a look at the video below for more details on the project and other examples of dynamic and interactive exploration of latent space in a GAN.

Source link

About Alma Ackerman

Check Also

Focus on medium-term macro-budgetary frameworks

The GDP figure for the second quarter of FY22, which grew 8.4%, sparked discussions on …