# Autoencoders

Structure: $\mathbf{\mathit{x}}\xmapsto{f}\mathbf{\mathit{h}}\xmapsto{g}\mathbf{\mathit{r}}$

- Encoder: $\mathbf{\mathit{h}}=f(x)$
- Decoder: $r=g(h)$

Types:

- Undercomplete: code dimension is less than the input
- Overcomplete: code has dimension greater than the input

Learning: minimize the loss function

$$
\begin{equation}
L(x,g(f(x)))
\end{equation}
$$

## Regularized Autoencoders

### Sparse Autoencoders

The training criterion involves a sparsity penalty $\Omega(h)$:

$$
\begin{equation}
L(x,g(f(x))) + \Omega(h)
\end{equation}
$$

Approximating maximum likelihood training of a generative
model that has latent variables.

- Visible variables $x$
- Latent variables $h$

Joint distribution:

$$
\begin{equation}
p_\text{model}(x,h)=p_\text{model}(h)p_\text{model}(x|h)
\end{equation}
$$

### Denoising Autoencoders

DAE minimizes

$$
\begin{equation}
L(x,g(f(\tilde{x})))
\end{equation}
$$

where $\tilde{x}$ is $x + \text{noise}$.

### Contractive Autoencoders

Use a different penalty $\Omega$:

$$
\begin{equation}
L(x,g(f(x))) + \Omega(h,x)
\end{equation}
$$

and

$$
\begin{equation}
\Omega(h,x) = \lambda\sum_i ||\nabla_x h_i||^2
\end{equation}
$$

## Deep Autoencoders

## Stochastic Encoders and Decoders