输入向量 x ∈ R m x \in \mathbb{R^m} x∈Rm,输出向量 y ∈ R n y\in \mathbb{R^n} y∈Rn
线性变换: y i = f ( a i + b i ) y_i= f(a_i+b_i) yi?=f(ai?+bi?)
其中:
LayerNorm:
a  ̄ i = a i ? μ σ g i \overline{a}_i=\frac{a_i-\mu}{\sigma}g_i ai?=σai??μ?gi?, y i = f ( a  ̄ i + b ) y_i= f(\overline{a}_i+b) yi?=f(ai?+b)
其中:
代码实现
class LayerNorm(torch.nn.Module):
def __int__(self, dim, eps=1e-6):
self.eps = eps
self.weight = nn.Parameter(dim)
def forward(self, x):
output = self._norm(x)
return output * self.weight
RMSNorm:
a  ̄ i = a i R M S ( a ) g i \overline{a}_i=\frac{a_i}{RMS(a)}g_i ai?=RMS(a)ai??gi?, y i = f ( a  ̄ i + b ) y_i= f(\overline{a}_i+b) yi?=f(ai?+b)
其中:
补充:
代码实现
class RMSNorm(torch.nn.Module):
def __init__(self, dim, eps=1e-6):
self.eps = eps
self.weight = nn.Parameter(torch.ones(dim))
def _norm(self, x):
return x * torch.rsqrt(x.pow(2).mean(-1, keep_dim=True) + self.eps)
def forward(self, x):
output = self._norm(x.float()).type_as(x)
return output * self.weight