对每个元素进行如下扰动
$$ \begin{aligned}h' =\begin{cases} 0 & \text{ 概率为 } p \\ \frac{h}{1-p} & \text{ 其他情况}\end{cases}\end{aligned} $$
Dropout 作用在隐藏全连接层的输出上
$$ \begin{aligned}\mathbf{h} & =\sigma\left(\mathbf{W}_1 \mathbf{x}+\mathbf{b}_1\right) \\\mathbf{h}^{\prime} & =\operatorname{dropout}(\mathbf{h}) \\\mathbf{o} & =\mathbf{W}_2 \mathbf{h}^{\prime}+\mathbf{b}_2 \\\mathbf{y} & =\operatorname{softmax}(\mathbf{o})\end{aligned} $$
dropout前后的多层感知机
import torch
from torch import nn
from d2l import torch as d2l
def dropout_layer(X, dropout):
assert 0 <= dropout <= 1
# 在本情况中,所有元素都被丢弃
if dropout == 1:
return torch.zeros_like(X)
# 在本情况中,所有元素都被保留
if dropout == 0:
return X
mask = (torch.rand(X.shape) > dropout).float()
return mask * X / (1.0 - dropout)