Autoencoders
Structure: \(\mathbf{\mathit{x}}\xmapsto{f}\mathbf{\mathit{h}}\xmapsto{g}\mathbf{\mathit{r}}\)
Encoder: \(\mathbf{\mathit{h}}=f(x)\)
Decoder: \(r=g(h)\)
Types:
Undercomplete: code dimension is less than the input
Overcomplete: code has dimension greater than the input
Learning: minimize the loss function
\[
\begin{equation}
L(x,g(f(x)))
\end{equation}
\]
Regularized Autoencoders
Sparse Autoencoders
The training criterion involves a sparsity penalty \(\Omega(h)\):
\[
\begin{equation}
L(x,g(f(x))) + \Omega(h)
\end{equation}
\]
Approximating maximum likelihood training of a generative model that has latent variables.
Visible variables \(x\)
Latent variables \(h\)
Joint distribution:
\[
\begin{equation}
p_\text{model}(x,h)=p_\text{model}(h)p_\text{model}(x|h)
\end{equation}
\]
Denoising Autoencoders
DAE minimizes
\[
\begin{equation}
L(x,g(f(\tilde{x})))
\end{equation}
\]
where \(\tilde{x}\) is \(x + \text{noise}\).
Contractive Autoencoders
Use a different penalty \(\Omega\):
\[
\begin{equation}
L(x,g(f(x))) + \Omega(h,x)
\end{equation}
\]
and
\[
\begin{equation}
\Omega(h,x) = \lambda\sum_i ||\nabla_x h_i||^2
\end{equation}
\]