动机[1]

原理[1]

对每个元素进行如下扰动

$$ \begin{aligned}h' =\begin{cases} 0 & \text{ 概率为 } p \\ \frac{h}{1-p} & \text{ 其他情况}\end{cases}\end{aligned} $$

Dropout 作用在隐藏全连接层的输出上

$$ \begin{aligned}\mathbf{h} & =\sigma\left(\mathbf{W}_1 \mathbf{x}+\mathbf{b}_1\right) \\\mathbf{h}^{\prime} & =\operatorname{dropout}(\mathbf{h}) \\\mathbf{o} & =\mathbf{W}_2 \mathbf{h}^{\prime}+\mathbf{b}_2 \\\mathbf{y} & =\operatorname{softmax}(\mathbf{o})\end{aligned} $$

                                  dropout前后的多层感知机

                              dropout前后的多层感知机

总结[1]

代码[1]

import torch
from torch import nn
from d2l import torch as d2l

def dropout_layer(X, dropout):
    assert 0 <= dropout <= 1
    # 在本情况中,所有元素都被丢弃
    if dropout == 1:
        return torch.zeros_like(X)
    # 在本情况中,所有元素都被保留
    if dropout == 0:
        return X
    mask = (torch.rand(X.shape) > dropout).float()
    return mask * X / (1.0 - dropout)