Autoencoders

Structure: \(\mathbf{\mathit{x}}\xmapsto{f}\mathbf{\mathit{h}}\xmapsto{g}\mathbf{\mathit{r}}\)

Types:

Learning: minimize the loss function

\[ \begin{equation} L(x,g(f(x))) \end{equation} \]

Regularized Autoencoders

The training criterion involves a sparsity penalty \(\Omega(h)\):

\[ \begin{equation} L(x,g(f(x))) + \Omega(h) \end{equation} \]

Approximating maximum likelihood training of a generative model that has latent variables.

Joint distribution:

\[ \begin{equation} p_\text{model}(x,h)=p_\text{model}(h)p_\text{model}(x|h) \end{equation} \]

DAE minimizes

\[ \begin{equation} L(x,g(f(\tilde{x}))) \end{equation} \]

where \(\tilde{x}\) is \(x + \text{noise}\).

Use a different penalty \(\Omega\):

\[ \begin{equation} L(x,g(f(x))) + \Omega(h,x) \end{equation} \]

and

\[ \begin{equation} \Omega(h,x) = \lambda\sum_i ||\nabla_x h_i||^2 \end{equation} \]