Auto-Encoding Variational Bayes - /s (strangemonad's notes)

- Title: [[Source Notes/Auto-Encoding Variational Bayes]] - Type: #source/paper - Author: - Reference: <+URL+> - Published at: - Reviewed at: [[2023-09-15]] - Links: [[Variational Auto Encoders]] [[Variational Inference]] [[Maximum Likelihood Estimation]] [[Auto Encoders]] [[Bayesian Inference]] [[Independent and Identically Distributed|i.i.d.]] [[Random Variable]] [[Stochastic Process]] [[Variational Inference and Expectation Maximization]] --- ## Setup - A simple stochastic process The process for $N$ iid samples $X = \{x^{(i)}\}_{i=1}^N$ is a 2 step-process ^[is this the simplest possible stochastic / non-deterministic process]: 1. Generate the value $z^{(i)}$ from a prior distribution $p_{\theta^*}(z)$ ie the unknown variable (aka [[Latent Variable]]). 2. Generate the value $x^{i}$ from the conditional distribution $p_{\theta^*}(x|z)$. In this setup, we assume that the prior distribution $p_{\theta^*}(z)$ and conditional distribution $p_{\theta^*}(x|z)$ come from parametric families of distributions $p_\theta(z)$ and $p_\theta(x|z)$ and that their PDFs are differentiable w.r.t both $\theta$ and $z$. The process is hidden from view. The true parameters $\theta^*$ and the values $z^{(i)}$ are unknown. A model for this process can be thought of as a **generator** or **generative** model. ## A General Inference Algorithm **Traditional approaches** - Make simplifying assumptions - analytical solutions case by case where possible depending on the types of distribution families Want a general algorithm - ***Intractability*** where the integral of the [[marginal likelihood]] is intractable -> [[Expectation Maximization|EM]] can't be used and mean-field VB are also intractable. This is very common in any reasonably complex process, especially if there's non-linearity e.g. a neural net with a non-linear hidden layer. - ***Large dataset*** full batch optimization is too expensive, we want to update parameters in mini-batches. Traditional Monte Carlo EM would be too slow - expensive sampling per datapoint. ### Desired properties - Efficient approximate max likelihood (ML) or max a posteriori (MAP) of the model's global parameters $\theta$. - Efficient approximate posterior inference of the latent variable $z$ given an observed value $x$. - Efficient approximate marginal inference of variable $x$. ## Variational guide - $q_\phi(z|)$ as an approximation of the true posterior $p_\theta(z|x)$ - $q_\phi(z|x)$ can be thought of as a recognition model ie, given some data, give me the latent variable values that characterize the data. - $q_\phi(z|x)$ is an encoder of data x into the latent variable space. From coding theory, the latent variables can be interpreted as a latent representation ie a code / encoding. - Specifically, $q_\phi(z|x)$, given a datapoint $x$ gives us a distribution over the possible latent code values $z$ that the datapoint could have been generated from. - $p_\theta(z|x)$ can be thought of as a probabilistic decoder – given a code $z$ it produces a distribution of possible corresponding values of $x$. ## Annotations >%% >```annotation-json >{"created":"2023-09-15T17:07:12.386Z","text":"Come back and deeply understand the backwards path here[]","updated":"2023-09-15T17:07:12.386Z","document":{"title":"1312.6114.pdf","link":[{"href":"urn:x-pdf:8bc661feca2c05e74bf46c493dc5d7a1"},{"href":"vault:/_Source/Papers/1312.6114.pdf"}],"documentFingerprint":"8bc661feca2c05e74bf46c493dc5d7a1"},"uri":"vault:/_Source/Papers/1312.6114.pdf","target":[{"source":"vault:/_Source/Papers/1312.6114.pdf","selector":[{"type":"TextPositionSelector","start":3455,"end":3691},{"type":"TextQuoteSelector","exact":"Solid lines denote the generative model pθ(z)pθ(x|z), dashed lines denote the variational approximation qφ(z|x) to the intractable posterior pθ(z|x). The variational parameters φare learned jointly with the generative model parameters θ.","prefix":"ical model under consideration. ","suffix":"straightforward to extend this s"}]}]} >``` >%% >*%%HIGHLIGHT%%Solid lines denote the generative model pθ(z)pθ(x|z), dashed lines denote the variational approximation qφ(z|x) to the intractable posterior pθ(z|x). The variational parameters φare learned jointly with the generative model parameters θ.* >%%LINK%%[[#^qbsp2tdz3x|show annotation]] >%%COMMENT%% >Come back and deeply understand the backwards path here >%%TAGS%% > ^qbsp2tdz3x