class: center, middle, inverse, title-slide # Deep Learning with R ## Generative Adversarial Networks, Autoencoders ### Mikhail Dozmorov ### Virginia Commonwealth University ### 2020-06-12 --- ## Generative adversarial networks (GANs) > The most important [recent development], in my opinion, is adversarial training (also called GAN for Generative Adversarial Networks). This is an idea that was originally proposed by Ian Goodfellow when he was a student with Yoshua Bengio at the University of Montreal (he since moved to Google Brain and recently to OpenAI). > This, and the variations that are now being proposed, is the most interesting idea in the last 10 years in ML, in my opinion. .right[Yann LeCun] .small[https://danieltakeshi.github.io/2017/03/05/understanding-generative-adversarial-networks/] --- ## Generative adversarial networks (GANs) - Unsupervised learning models that aim to generate data points that are indistinguishable from the observed ones - Aim to learn the data-generating process - GANs were proposed as a radically different approach to generative modeling that involves two neural networks, a discriminator and a generator network - They are trained jointly, whereby the generator aims to generate realistic data points, and the discriminator classifies whether a given sample is real or generated by the generator .small[Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “[Generative Adversarial Networks](http://arxiv.org/abs/1406.2661)” ArXiv, 2014 ] --- ## Generative adversarial networks (GANs) .center[<img src="img/gan.png" height=450>] .small[https://www.analyticsvidhya.com/blog/2020/01/generative-models-gans-computer-vision/] --- ## Generative adversarial networks (GANs) .pull-left[<img src="img/gan_generator.png" height=300>] .pull-right[<img src="img/gan_discriminator.png" height=300>] We train the model, calculate the loss function at the end of the discriminator network and backpropagate the loss into both discriminator and generator models .small[https://www.analyticsvidhya.com/blog/2020/01/generative-models-gans-computer-vision/] --- ## Applications of GANs - GANs for Image Editing - Using GANs for Security - Generating Data using GANs (music, text, speech, etc.) - GANs for Attention Prediction - GANs for 3D Object Generation .small[https://www.analyticsvidhya.com/blog/2019/04/top-5-interesting-applications-gans-deep-learning/] --- ## Style transfer - Style transfer consists of creating a new image that preserves the contents of a target image while also capturing the style of a reference image - Content can be captured by the high-level activations of a convnet - Style can be captured by the internal correlations of the activations of different layers of a convnet .small[Chapter 8.3 http://www.byungsoo.me/project/lnst/index.html ] --- ## CycleGAN: domain transformation CycleGAN learns transformation across domains with unpaired data .center[<img src="https://junyanz.github.io/CycleGAN/images/teaser_high_res.jpg" height=400>] .small[https://junyanz.github.io/CycleGAN/] --- ## CycleGAN: domain transformation .center[<iframe width="672" height="378" src="https://www.youtube.com/embed/9reHvktowLY" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>] .small[https://junyanz.github.io/CycleGAN/ https://interestingengineering.com/elon-musks-deepfake-video-of-singing-soviet-space-song-breaks-the-internet] --- ## Autoencoders - Autoencoder is an unsupervised neural network trained to reconstruct the input. **Auto**matically **encoding** data - One or more bottleneck layers have lower dimensionality than the input, which leads to compression of data and forces the autoencoder to extract useful features and omit unimportant features in the reconstruction .center[<img src="img/keras_autoencoders_applications.png" height=200>] .small[https://www.pyimagesearch.com/2020/02/17/autoencoders-with-keras-tensorflow-and-deep-learning/] --- ## Autoencoders - Autoencoders learn a **compressed representation** of the input data by reconstructing it on the output of the network - Goal: capture the structure of the data `\(x\)` (i.e., intrinsic relationships between the data variables) in a low-dimensional latent space `\(z\)`, and allows for more accurate downstream analyses .center[<img src="img/autoencoder1.png" height=200>] --- ## Autoencoder - Generally, an autoencoder consists of two networks, an encoder and a decoder, which broadly perform the following tasks: - **Encoder**: Maps the high dimensional input data into a latent variable embedding which has lower dimensions than the input. - **Decoder**: Attempts to reconstruct the input data from the embedding. - Areas of application: - Dimensionality reduction - Data denoising - Compression and data generation --- ## Basic autoencoder network .center[<img src="img/autoencoder2.png" height=200>] This network is trained in such a way that the features ( `\(z\)` ) can be used to reconstruct the original input data ( `\(x\)` ). If the output ( `\(\hat{X}\)` ) is different from the input ( `\(x\)` ), the loss penalizes it and helps to reconstruct the input data --- ## How autoencoder learns - Image denoising problem - removing noise from images .center[<img src="img/autoencoder3.png" height=400>] .small[https://www.analyticsvidhya.com/blog/2020/02/what-is-autoencoder-enhance-image-resolution/] --- ## Autoencoder calculations - The model contains an encoder function `\(f(.)\)` parameterised by `\(\theta\)` and a decoder function `\(g(.)\)` parameterised by `\(\phi\)`. The lower dimensional embedding learned for an input `\(x\)` in the bottleneck layer is `\(h = f_{\theta}(x)\)` and the reconstructed input is `\(x' = g_{\phi}(f_{\theta}(x))\)`. - The parameters `\(\theta,\phi\)` are learned together to output a reconstructed data sample that is ideally the same as the original input `\(x' \approx g_{\phi}(f_{\theta}(x))\)` - There are various metrics used to quantify the error between the input and output such as cross-entropy (CE) or simpler metrics such as mean squared error: `\(L_{AE}(\theta,\phi) = \frac{1}{n}\sum_{i=0}^n(x_i - g_{\phi}(f_{\theta}(x_i))^2\)` --- ## Autoencoder variants The main challenge when designing an autoencoder is its sensitivity to the input data. While an autoencoder should learn a representation that embeds the key data traits as accurately as possible, it should also be able to encode traits which generalize beyond the original training set and capture similar characteristics in other data sets Thus, several variants have been proposed since autoencoders were first introduced. These variants mainly aim to address shortcomings such as improved generalization, disentanglement and modification to sequence input models. Some significant examples include the **Denoising Autoencoder** (DAE), **Sparse Autoencoder** (SAE), and more recently the **Variational Autoencoder** (VAE) .small[ [Vincent et al., 2008, Extracting and Composing Robust Features with Denoising Autoencoders](https://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf) [Makhzani and Frey, 2014, k-Sparse Autoencoders](https://arxiv.org/pdf/1312.5663.pdf) [Kingma and Welling, 2014, Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114) ] --- ## Variational Autoencoder - VAEs are autoencoders with additional distribution assumptions that enable them to generate new random samples - A VAE, instead of compressing its input image into a fixed code in the latent space, turns the image into the parameters of a statistical distribution: a mean and a variance - The assumption is that the input image has been generated by a statistical process, and that the randomness of this process should be taken into account during encoding and decoding .center[<img src="img/vae.png" height=200>] .small[https://www.analyticsvidhya.com/blog/2020/01/generative-models-gans-computer-vision/] --- ## Variational Autoencoder - The VAE then uses the mean and variance parameters to randomly sample one element of the distribution and decodes that element back to the original input - The parameters of a VAE are trained via two loss functions: a _reconstruction loss_ that forces the decoded samples to match the initial inputs, and a _regularization loss_ that helps learn well-formed latent spaces and reduce overfitting to the training data --- ## Image generation - The key idea of image generation is to learn latent spaces that capture statistical information about a dataset of images - The module capable of realizing this mapping, taking as input a latent point and outputting an image (a grid of pixels), is called a _generator_ (in the case of GANs) or a _decoder_ (in the case of VAEs) - Once such a latent space has been developed, you can sample points from it, either deliberately or at random, and, by mapping them to image space, generate images that have never been seen before --- ## GAN applications StyleGAN2 is a state-of-the-art network in generating realistic images. Besides, it was explicitly trained to have disentangled directions in latent space, which allows efficient image manipulation by varying latent factors .center[<img src="https://github.com/EvgenyKashin/stylegan2-distillation/raw/master/imgs/title.jpg" height=250>] .small[Viazovetskyi Y. et al., 2020, "[StyleGAN2 Distillation for Feed-forward Image Manipulation](https://arxiv.org/abs/2003.03581)", arXiv:2003.03581б https://github.com/EvgenyKashin/stylegan2-distillation Fake celebrity faces, https://medium.com/datadriveninvestor/artificial-intelligence-gans-can-create-fake-celebrity-faces-44fe80d419f7] --- ## LSTMs as generative networks - LSTMs trained on collections of text can be run to generate text - predict the next token(s) given previous tokens - Language model, can be the word- or character-based - Can be done for handwriting generation, music, speech generation --- ## ConvNets as generative networks - ConvNets trained on collections of images can be run in reverse to generate images based on the representation learned by the network - Visual representation model, DeepDream - Can be done for speech, music, and more .small[https://deepdreamgenerator.com/ https://www.tensorflow.org/tutorials/generative/deepdream] --- ## Deep belief networks .small[http://www.scholarpedia.org/article/Deep_belief_networks]