??我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数
f
:
R
n
→
R
f:\mathbb{R}^{n}\to\mathbb{R}
f:Rn→R的输入是一个
n
n
n维向量
x
?
=
[
x
1
x
2
?
?
?
x
n
]
\vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix}
x=
?x1?x2????xn??
?,输出是一个标量。 函数
f
(
x
?
)
f(\vec x)
f(x)相对于
x
?
\vec x
x的梯度是一个包含
n
n
n个偏导数的向量:
?
x
?
f
(
x
?
)
=
[
?
f
(
x
?
)
?
x
1
?
f
(
x
?
)
?
x
2
?
?
?
?
f
(
x
?
)
?
x
n
]
\nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix}
?x?f(x)=
??x1??f(x)??x2??f(x)?????xn??f(x)??
?
其中
?
x
?
f
(
x
?
)
\nabla_{\vec x} f(\vec x)
?x?f(x)通常在没有歧义时被
?
f
(
x
?
)
\nabla f(\vec x)
?f(x)取代。
假设 x ? \vec x x为 n n n维向量,在微分多元函数时经常使用以下规则:
证明:设
A
(
m
,
n
)
A_{(m,n)}
A(m,n)? =
[
a
1
,
1
a
1
,
2
?
?
?
a
1
,
n
a
2
,
1
a
2
,
2
?
?
?
a
2
,
n
?
?
?
?
?
?
?
?
?
?
?
?
a
m
,
1
a
m
,
2
?
?
?
a
m
,
n
]
\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix}
?a1,1?a2,1????am,1??a1,2?a2,2????am,2???????????????a1,n?a2,n????am,n??
?,
则
A
x
?
(
m
,
1
)
A\vec x_{(m,1)}
Ax(m,1)? =
[
a
1
,
1
x
1
+
a
1
,
2
x
2
+
?
?
?
+
a
1
,
n
x
n
a
2
,
1
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
2
,
n
x
n
?
?
?
a
m
,
1
x
1
+
a
m
,
2
x
2
+
?
?
?
+
a
m
,
n
x
n
]
\begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix}
?a1,1?x1?+a1,2?x2?+???+a1,n?xn?a2,1?x1?+a2,2?x2?+???+a2,n?xn????am,1?x1?+am,2?x2?+???+am,n?xn??
?,
?
x
?
A
x
?
\nabla_{\vec x}A\vec x
?x?Ax=
[
?
A
x
?
?
x
1
?
A
x
?
?
x
2
?
?
?
?
A
x
?
?
x
n
]
\begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix}
??x1??Ax??x2??Ax?????xn??Ax??
?
=
[
?
a
1
,
1
x
1
+
a
1
,
2
x
2
+
?
?
?
+
a
1
,
n
x
n
?
x
1
?
a
2
,
1
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
2
,
n
x
n
?
x
1
?
?
?
?
a
m
,
1
x
1
+
a
m
,
2
x
2
+
?
?
?
+
a
m
,
n
x
n
?
x
1
?
a
1
,
1
x
1
+
a
1
,
2
x
2
+
?
?
?
+
a
1
,
n
x
n
?
x
2
?
a
2
,
1
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
2
,
n
x
n
?
x
2
?
?
?
?
a
m
,
1
x
1
+
a
m
,
2
x
2
+
?
?
?
+
a
m
,
n
x
n
?
x
2
?
?
?
?
?
?
?
?
?
?
?
?
?
a
1
,
1
x
1
+
a
1
,
2
x
2
+
?
?
?
+
a
1
,
n
x
n
?
x
n
?
a
2
,
1
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
2
,
n
x
n
?
x
n
?
?
?
?
a
m
,
1
x
1
+
a
m
,
2
x
2
+
?
?
?
+
a
m
,
n
x
n
?
x
n
]
\begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix}
??x1??a1,1?x1?+a1,2?x2?+???+a1,n?xn???x2??a1,1?x1?+a1,2?x2?+???+a1,n?xn??????xn??a1,1?x1?+a1,2?x2?+???+a1,n?xn????x1??a2,1?x1?+a2,2?x2?+???+a2,n?xn???x2??a2,1?x1?+a2,2?x2?+???+a2,n?xn??????xn??a2,1?x1?+a2,2?x2?+???+a2,n?xn?????????????????x1??am,1?x1?+am,2?x2?+???+am,n?xn???x2??am,1?x1?+am,2?x2?+???+am,n?xn??????xn??am,1?x1?+am,2?x2?+???+am,n?xn???
?
=
[
a
1
,
1
a
2
,
1
?
?
?
a
m
,
1
a
1
,
2
a
2
,
2
?
?
?
a
m
,
2
?
?
?
?
?
?
?
?
?
?
?
?
a
1
,
n
a
2
,
n
?
?
?
a
m
,
n
]
\begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix}
?a1,1?a1,2????a1,n??a2,1?a2,2????a2,n???????????????am,1?am,2????am,n??
?=
A
?
A^\top
A?
证明:设
A
(
n
,
m
)
A_{(n,m)}
A(n,m)?=
[
a
1
,
1
a
1
,
2
?
?
?
a
1
,
m
a
2
,
1
a
2
,
2
?
?
?
a
2
,
m
?
?
?
?
?
?
?
?
?
?
?
?
a
n
,
1
a
n
,
2
?
?
?
a
n
,
m
]
\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix}
?a1,1?a2,1????an,1??a1,2?a2,2????an,2???????????????a1,m?a2,m????an,m??
?,
则
x
?
?
A
\vec x^\top A
x?A=
[
a
1
,
1
x
1
+
a
2
,
1
x
2
+
?
?
?
+
a
n
,
1
x
n
a
1
,
2
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
n
,
2
x
n
?
?
?
a
1
,
m
x
1
+
a
2
,
m
x
2
+
?
?
?
+
a
n
,
m
x
n
]
\begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \end{bmatrix}
[a1,1?x1?+a2,1?x2?+???+an,1?xn??a1,2?x1?+a2,2?x2?+???+an,2?xn??????a1,m?x1?+a2,m?x2?+???+an,m?xn??],
?
x
?
x
?
?
A
\nabla_{\vec x}\vec x^\top A
?x?x?A=
[
?
x
?
?
A
?
x
1
?
x
?
?
A
?
x
2
?
?
?
?
x
?
?
A
?
x
n
]
\begin{bmatrix}\frac{\partial \vec x^\top A}{\partial x_1}\\\frac{\partial \vec x^\top A}{\partial x_2}\\···\\ \frac{\partial \vec x^\top A}{\partial x_n}\end{bmatrix}
??x1??x?A??x2??x?A?????xn??x?A??
?
=
[
?
a
1
,
1
x
1
+
a
2
,
1
x
2
+
?
?
?
+
a
n
,
1
x
n
?
x
1
?
a
1
,
2
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
n
,
2
x
n
?
x
1
?
?
?
?
a
1
,
m
x
1
+
a
2
,
m
x
2
+
?
?
?
+
a
n
,
m
x
n
?
x
1
?
a
1
,
1
x
1
+
a
2
,
1
x
2
+
?
?
?
+
a
n
,
1
x
n
?
x
2
?
a
1
,
2
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
n
,
2
x
n
?
x
2
?
?
?
?
a
1
,
m
x
1
+
a
2
,
m
x
2
+
?
?
?
+
a
n
,
m
x
n
?
x
2
?
?
?
?
?
?
?
?
?
?
?
?
?
a
1
,
1
x
1
+
a
2
,
1
x
2
+
?
?
?
+
a
n
,
1
x
n
?
x
n
?
a
1
,
2
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
n
,
2
x
n
?
x
n
?
?
?
?
a
1
,
m
x
1
+
a
2
,
m
x
2
+
?
?
?
+
a
n
,
m
x
n
?
x
n
]
\begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_1}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_1}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_2}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_2}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_n}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_n}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_n}\end{bmatrix}
??x1??a1,1?x1?+a2,1?x2?+???+an,1?xn???x2??a1,1?x1?+a2,1?x2?+???+an,1?xn??????xn??a1,1?x1?+a2,1?x2?+???+an,1?xn????x1??a1,2?x1?+a2,2?x2?+???+an,2?xn???x2??a1,2?x1?+a2,2?x2?+???+an,2?xn??????xn??a1,2?x1?+a2,2?x2?+???+an,2?xn?????????????????x1??a1,m?x1?+a2,m?x2?+???+an,m?xn???x2??a1,m?x1?+a2,m?x2?+???+an,m?xn??????xn??a1,m?x1?+a2,m?x2?+???+an,m?xn???
?
=
[
a
1
,
1
a
1
,
2
?
?
?
a
1
,
m
a
2
,
1
a
2
,
2
?
?
?
a
2
,
m
?
?
?
?
?
?
?
?
?
?
?
?
a
n
,
1
a
n
,
2
?
?
?
a
n
,
m
]
\begin{bmatrix} a_{1,1} & a_{1,2}&···&a_{1,m}\\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ···&···&···&···\\ a_{n,1}&a_{n,2}&···&a_{n,m} \end{bmatrix}
?a1,1?a2,1????an,1??a1,2?a2,2????an,2???????????????a1,m?a2,m????an,m??
?=
A
A
A
证明:设
A
(
n
,
n
)
A_{(n,n)}
A(n,n)?=
[
a
1
,
1
a
1
,
2
?
?
?
a
1
,
n
a
2
,
1
a
2
,
2
?
?
?
a
2
,
n
?
?
?
?
?
?
?
?
?
?
?
?
a
n
,
1
a
n
,
2
?
?
?
a
n
,
n
]
\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,n} \end{bmatrix}
?a1,1?a2,1????an,1??a1,2?a2,2????an,2???????????????a1,n?a2,n????an,n??
?,
则
x
?
?
A
\vec x^\top A
x?A=
[
a
1
,
1
x
1
+
a
2
,
1
x
2
+
?
?
?
+
a
n
,
1
x
n
a
1
,
2
x
1
+
a
2
,
2
x
2
+
?
?
?
+
a
n
,
2
x
n
?
?
?
a
1
,
n
x
1
+
a
2
,
n
x
2
+
?
?
?
+
a
n
,
n
x
n
]
\begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \end{bmatrix}
[a1,1?x1?+a2,1?x2?+???+an,1?xn??a1,2?x1?+a2,2?x2?+???+an,2?xn??????a1,n?x1?+a2,n?x2?+???+an,n?xn??],
x
?
?
A
x
?
\vec x^\top A \vec x
x?Ax=
[
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
]
\begin{bmatrix} \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j) \end{bmatrix}
[i=1∑n?j=1∑n?(ai,j?xi?xj?)?],
?
x
?
x
?
?
A
x
?
\nabla_{\vec x}\vec x^\top A \vec x
?x?x?Ax=
[
?
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
?
x
1
?
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
?
x
2
?
?
?
?
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
?
x
n
]
\begin{bmatrix} \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_1} \\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_2} \\ ···\\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_n} \end{bmatrix}
??x1??i=1∑n?j=1∑n?(ai,j?xi?xj?)??x2??i=1∑n?j=1∑n?(ai,j?xi?xj?)?????xn??i=1∑n?j=1∑n?(ai,j?xi?xj?)??
?=
[
∑
i
=
1
n
(
a
i
,
1
+
a
1
,
i
)
x
i
∑
i
=
1
n
(
a
i
,
2
+
a
2
,
i
)
x
i
?
?
?
∑
i
=
1
n
(
a
i
,
n
+
a
n
,
i
)
x
i
]
\begin{bmatrix} \sum\limits_{i=1}^{n}(a_{i,1}+a_{1,i})x_i \\ \sum\limits_{i=1}^{n}(a_{i,2}+a_{2,i})x_i \\ ···\\ \sum\limits_{i=1}^{n}(a_{i,n}+a_{n,i})x_i \\ \end{bmatrix}
?i=1∑n?(ai,1?+a1,i?)xi?i=1∑n?(ai,2?+a2,i?)xi????i=1∑n?(ai,n?+an,i?)xi??
?
=
[
2
a
1
,
1
a
1
,
2
+
a
2
,
1
?
?
?
a
1
,
n
+
a
n
,
1
a
2
,
1
+
a
1
,
2
2
a
2
,
2
?
?
?
a
2
,
n
+
a
n
,
2
?
?
?
?
?
?
?
?
?
?
?
?
a
n
,
1
+
a
1
,
n
a
n
,
2
+
a
2
,
n
?
?
?
2
a
n
,
n
]
[
x
1
x
2
?
?
?
x
n
]
\begin{bmatrix} 2a_{1,1} & a_{1,2}+a_{2,1} & ···&a_{1,n}+a_{n,1} \\ a_{2,1}+a_{1,2} & 2a_{2,2} & ···&a_{2,n}+a_{n,2} \\ ···&···&···&···\\ a_{n,1}+a_{1,n} & a_{n,2}+a_{2,n} & ···&2a_{n,n} \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ ···\\ x_n \end{bmatrix}
?2a1,1?a2,1?+a1,2????an,1?+a1,n??a1,2?+a2,1?2a2,2????an,2?+a2,n???????????????a1,n?+an,1?a2,n?+an,2????2an,n??
?
?x1?x2????xn??
?=
(
A
+
A
?
)
x
?
(A+A^\top)\vec x
(A+A?)x
证明:
?
x
?
∥
x
∥
2
\nabla_{\vec x}\Vert x \Vert ^2
?x?∥x∥2=
?
x
?
x
1
2
+
x
2
2
+
?
?
?
+
x
n
n
2
\nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2
?x?x12?+x22?+???+xnn??2=
?
x
?
x
1
2
+
x
2
2
+
?
?
?
+
x
n
n
\nabla_{\vec x}x_1^2+x_2^2+···+x_n^n
?x?x12?+x22?+???+xnn?=
?
x
?
x
?
x
\nabla_{\vec x}x^\top x
?x?x?x;
?
x
?
∥
x
∥
2
\nabla_{\vec x}\Vert x \Vert ^2
?x?∥x∥2=
?
x
?
x
1
2
+
x
2
2
+
?
?
?
+
x
n
n
2
\nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2
?x?x12?+x22?+???+xnn??2=
?
x
?
x
1
2
+
x
2
2
+
?
?
?
+
x
n
n
\nabla_{\vec x}x_1^2+x_2^2+···+x_n^n
?x?x12?+x22?+???+xnn?=
[
2
x
1
2
x
2
?
?
?
2
x
n
]
\begin{bmatrix} 2x_1\\ 2x_2\\ ···\\ 2x_n \end{bmatrix}
?2x1?2x2????2xn??
?=
2
x
2x
2x
??同样,对于任何矩阵 X X X,都有 ? X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X ?X?∥X∥F2?=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。
证明:设
X
X
X为
m
×
n
m\times n
m×n的矩阵
X
=
[
x
1
,
1
x
1
,
2
?
?
?
x
1
,
n
x
2
,
1
x
2
,
2
?
?
?
x
2
,
n
?
?
?
?
?
?
?
?
?
?
?
?
x
m
,
1
x
m
,
2
?
?
?
x
m
,
n
]
X = \begin{bmatrix} x_{1,1}& x_{1,2}&···&x_{1,n}\\ x_{2,1}& x_{2,2}&···&x_{2,n}\\ ···&···&···&···\\ x_{m,1}& x_{m,2}&···&x_{m,n}\\ \end{bmatrix}
X=
?x1,1?x2,1????xm,1??x1,2?x2,2????xm,2???????????????x1,n?x2,n????xm,n??
?,
∥
X
∥
F
2
\Vert X \Vert_F^2
∥X∥F2?=
∑
i
=
1
m
∑
j
=
1
n
x
i
,
j
2
\sqrt{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2}
i=1∑m?j=1∑n?xi,j2??,
?
X
∥
X
∥
F
2
\nabla_X \Vert X \Vert_F^2
?X?∥X∥F2?=
[
2
x
1
,
1
2
x
1
,
2
?
?
?
2
x
1
,
n
2
x
2
,
1
2
x
2
,
2
?
?
?
2
x
2
,
n
?
?
?
?
?
?
?
?
?
?
?
?
2
x
m
,
1
2
x
m
,
2
?
?
?
2
x
m
,
n
]
\begin{bmatrix} 2x_{1,1}& 2x_{1,2}&···&2x_{1,n}\\ 2x_{2,1}& 2x_{2,2}&···&2x_{2,n}\\ ···&···&···&···\\ 2x_{m,1}& 2x_{m,2}&···&2x_{m,n}\\ \end{bmatrix}
?2x1,1?2x2,1????2xm,1??2x1,2?2x2,2????2xm,2???????????????2x1,n?2x2,n????2xm,n??
?=
2
X
2X
2X
初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论