Variational Autoencoders (VAE) Explained
Table of Contents
Variational autoencoders (VAEs) were introduced in 2013 by two independent research groups: one led by Diederik P. Kingma and Max Welling, and the other by Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Their papers, “Auto-Encoding Variational Bayes” and “Stochastic Backpropagation and Approximate Inference in Deep Generative Models,” were published in 2014.
A variational autoencoder (VAE) is a generative model used for tasks such as unsupervised learning, dimensionality reduction, and representation learning. It combines concepts from deep learning, specifically neural networks, with ideas from probabilistic modeling and Bayesian inference.
Before VAEs, the best practice for generative modeling included approaches such as Restricted Boltzmann Machines (RBMs) and Deep Belief Networks (DBNs), which were popular for unsupervised learning tasks. However, these methods had certain limitations, such as difficulties in training, scaling, and sampling from the model.
The introduction of VAEs marked a significant advancement in generative modeling, combining deep learning with probabilistic modeling in a more efficient and scalable manner.
Autoencoder Mechanism
Autoencoders are different from variational autoencoders (VAE). Autoencoders are neural networks used primarily for unsupervised learning, dimensionality reduction, and representation learning. They were introduced before VAEs and consisted of two main components: an encoder and a decoder.
- Encoder: The encoder is a neural network that takes input data (e.g., images, text) and maps it to a lower-dimensional latent space or representation, effectively compressing the input data.
- Decoder: The decoder is another neural network that takes the encoded representation from the latent space and attempts to reconstruct the original input data. It learns to generate data similar to the input by minimizing the reconstruction error.
The training objective of an autoencoder is to minimize the reconstruction loss, which is the difference between the input data and the data reconstructed by the decoder. Autoencoders learn a compact and efficient representation of the input data in the latent space while maintaining the ability to reconstruct the original data.
Variational Autoencoder Mechanism
Variational autoencoders (VAEs) were later introduced as an extension of autoencoders, focusing on generative modeling and incorporating probabilistic modeling techniques. While VAEs also consist of an encoder and a decoder, they are designed to learn a probability distribution over the latent space, rather than a deterministic mapping, as in autoencoders.
Autoencoders focus on learning efficient representations and reconstructing input data, while VAEs aim to learn probabilistic mapping in the latent space for generative modeling purposes. VAEs also introduce a regularization term (the KL divergence) and the reconstruction loss during training, ensuring a smooth and continuous structure in the latent space.
The mechanism of VAEs consists of two main parts: an encoder and a decoder.
- Encoder: The encoder, or the recognition model or inference model, is a neural network that takes input data (e.g., images, text) and maps it to a lower-dimensional latent space. Instead of directly outputting a single point in the latent space, the encoder estimates the parameters of a probability distribution (usually Gaussian) that represents the underlying structure of the input data. In the case of a Gaussian distribution, the encoder outputs two sets of values: means (μ) and standard deviations (σ).
- Decoder: The decoder, also known as the generative model, is another neural network that takes a point from the latent space (sampled from the distribution estimated by the encoder) and attempts to reconstruct the original input data. The decoder effectively learns to generate data similar to the input by minimizing the reconstruction error.
Training Process
The VAE’s training process involves optimizing two main objectives:
Reconstruction Loss
This is the difference between the input data and the data reconstructed by the decoder. The goal is to minimize this loss, encouraging the model to generate data that closely resembles the input.
Kullback-Leibler Divergence
The Kullback-Leibler (KL) divergence measures the difference between the estimated probability distribution in the latent space and a target distribution (usually a standard Gaussian). The VAE aims to minimize the KL divergence, which acts as a regularization term and ensures the latent space has a smooth and continuous structure.
By optimizing these two objectives, the VAE learns a compact and meaningful representation of the input data in the latent space while maintaining the ability to generate realistic data samples.
Conclusion
A variational autoencoder is a generative model that uses neural networks to map input data to a latent space and then reconstructs the data based on points sampled. The model is trained by optimizing a combination of reconstruction loss and KL divergence, resulting in a smooth and meaningful latent representation of the input data.