Autoencoders

Structure: \(\mathbf{\mathit{x}}\xmapsto{f}\mathbf{\mathit{h}}\xmapsto{g}\mathbf{\mathit{r}}\)

  • Encoder: \(\mathbf{\mathit{h}}=f(x)\)

  • Decoder: \(r=g(h)\)

Types:

  • Undercomplete: code dimension is less than the input

  • Overcomplete: code has dimension greater than the input

Learning: minimize the loss function

\[ \begin{equation} L(x,g(f(x))) \end{equation} \]

Regularized Autoencoders

Sparse Autoencoders

The training criterion involves a sparsity penalty \(\Omega(h)\):

\[ \begin{equation} L(x,g(f(x))) + \Omega(h) \end{equation} \]

Approximating maximum likelihood training of a generative model that has latent variables.

  • Visible variables \(x\)

  • Latent variables \(h\)

Joint distribution:

\[ \begin{equation} p_\text{model}(x,h)=p_\text{model}(h)p_\text{model}(x|h) \end{equation} \]

Denoising Autoencoders

DAE minimizes

\[ \begin{equation} L(x,g(f(\tilde{x}))) \end{equation} \]

where \(\tilde{x}\) is \(x + \text{noise}\).

Contractive Autoencoders

Use a different penalty \(\Omega\):

\[ \begin{equation} L(x,g(f(x))) + \Omega(h,x) \end{equation} \]

and

\[ \begin{equation} \Omega(h,x) = \lambda\sum_i ||\nabla_x h_i||^2 \end{equation} \]

Deep Autoencoders

Stochastic Encoders and Decoders