- Title: [[Source Notes/Auto-Encoding Variational Bayes]]
- Type: #source/paper
- Author:
- Reference: <+URL+>
- Published at:
- Reviewed at: [[2023-09-15]]
- Links: [[Variational Auto Encoders]] [[Variational Inference]] [[Maximum Likelihood Estimation]] [[Auto Encoders]] [[Bayesian Inference]] [[Independent and Identically Distributed|i.i.d.]] [[Random Variable]] [[Stochastic Process]] [[Variational Inference and Expectation Maximization]]
---
## Setup - A simple stochastic process
The process for $N$ iid samples $X = \{x^{(i)}\}_{i=1}^N$ is a 2 step-process ^[is this the simplest possible stochastic / non-deterministic process]:
1. Generate the value $z^{(i)}$ from a prior distribution $p_{\theta^*}(z)$ ie the unknown variable (aka [[Latent Variable]]).
2. Generate the value $x^{i}$ from the conditional distribution $p_{\theta^*}(x|z)$.
In this setup, we assume that the prior distribution $p_{\theta^*}(z)$ and conditional distribution $p_{\theta^*}(x|z)$ come from parametric families of distributions $p_\theta(z)$ and $p_\theta(x|z)$ and that their PDFs are differentiable w.r.t both $\theta$ and $z$.
The process is hidden from view. The true parameters $\theta^*$ and the values $z^{(i)}$ are unknown.
A model for this process can be thought of as a **generator** or **generative** model.
## A General Inference Algorithm
**Traditional approaches**
- Make simplifying assumptions
- analytical solutions case by case where possible depending on the types of distribution families
Want a general algorithm
- ***Intractability*** where the integral of the [[marginal likelihood]] is intractable -> [[Expectation Maximization|EM]] can't be used and mean-field VB are also intractable. This is very common in any reasonably complex process, especially if there's non-linearity e.g. a neural net with a non-linear hidden layer.
- ***Large dataset*** full batch optimization is too expensive, we want to update parameters in mini-batches. Traditional Monte Carlo EM would be too slow - expensive sampling per datapoint.
### Desired properties
- Efficient approximate max likelihood (ML) or max a posteriori (MAP) of the model's global parameters $\theta$.
- Efficient approximate posterior inference of the latent variable $z$ given an observed value $x$.
- Efficient approximate marginal inference of variable $x$.
## Variational guide
- $q_\phi(z|)$ as an approximation of the true posterior $p_\theta(z|x)$
- $q_\phi(z|x)$ can be thought of as a recognition model ie, given some data, give me the latent variable values that characterize the data.
- $q_\phi(z|x)$ is an encoder of data x into the latent variable space. From coding theory, the latent variables can be interpreted as a latent representation ie a code / encoding.
- Specifically, $q_\phi(z|x)$, given a datapoint $x$ gives us a distribution over the possible latent code values $z$ that the datapoint could have been generated from.
- $p_\theta(z|x)$ can be thought of as a probabilistic decoder – given a code $z$ it produces a distribution of possible corresponding values of $x$.
## Annotations
>%%
>```annotation-json
>{"created":"2023-09-15T17:07:12.386Z","text":"Come back and deeply understand the backwards path here[]","updated":"2023-09-15T17:07:12.386Z","document":{"title":"1312.6114.pdf","link":[{"href":"urn:x-pdf:8bc661feca2c05e74bf46c493dc5d7a1"},{"href":"vault:/_Source/Papers/1312.6114.pdf"}],"documentFingerprint":"8bc661feca2c05e74bf46c493dc5d7a1"},"uri":"vault:/_Source/Papers/1312.6114.pdf","target":[{"source":"vault:/_Source/Papers/1312.6114.pdf","selector":[{"type":"TextPositionSelector","start":3455,"end":3691},{"type":"TextQuoteSelector","exact":"Solid lines denote the generative model pθ(z)pθ(x|z), dashed lines denote the variational approximation qφ(z|x) to the intractable posterior pθ(z|x). The variational parameters φare learned jointly with the generative model parameters θ.","prefix":"ical model under consideration. ","suffix":"straightforward to extend this s"}]}]}
>```
>%%
>*%%HIGHLIGHT%%Solid lines denote the generative model pθ(z)pθ(x|z), dashed lines denote the variational approximation qφ(z|x) to the intractable posterior pθ(z|x). The variational parameters φare learned jointly with the generative model parameters θ.*
>%%LINK%%[[#^qbsp2tdz3x|show annotation]]
>%%COMMENT%%
>Come back and deeply understand the backwards path here
>%%TAGS%%
>
^qbsp2tdz3x