【深度学习】动手学深度学习(PyTorch版)李沐 2.4.3 梯度【公式推导】

发布时间:2024年01月10日

2.4.3. 梯度

??我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数 f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:RnR的输入是一个 n n n维向量 x ? = [ x 1 x 2 ? ? ? x n ] \vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix} x = ?x1?x2????xn?? ?,输出是一个标量。 函数 f ( x ? ) f(\vec x) f(x )相对于 x ? \vec x x 的梯度是一个包含 n n n个偏导数的向量:
? x ? f ( x ? ) = [ ? f ( x ? ) ? x 1 ? f ( x ? ) ? x 2 ? ? ? ? f ( x ? ) ? x n ] \nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix} ?x ?f(x )= ??x1??f(x )??x2??f(x )?????xn??f(x )?? ?
其中 ? x ? f ( x ? ) \nabla_{\vec x} f(\vec x) ?x ?f(x )通常在没有歧义时被 ? f ( x ? ) \nabla f(\vec x) ?f(x )取代。


假设 x ? \vec x x n n n维向量,在微分多元函数时经常使用以下规则:

一、对于所有 A ∈ R m × n A \in \mathbb{R^{m\times n}} ARm×n,都有 ? x ? A x ? = A ? \nabla_{\vec x} A\vec x = A^\top ?x ?Ax =A?

证明:设 A ( m , n ) A_{(m,n)} A(m,n)? = [ a 1 , 1 a 1 , 2 ? ? ? a 1 , n a 2 , 1 a 2 , 2 ? ? ? a 2 , n ? ? ? ? ? ? ? ? ? ? ? ? a m , 1 a m , 2 ? ? ? a m , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix} ?a1,1?a2,1????am,1??a1,2?a2,2????am,2???????????????a1,n?a2,n????am,n?? ?
A x ? ( m , 1 ) A\vec x_{(m,1)} Ax (m,1)? = [ a 1 , 1 x 1 + a 1 , 2 x 2 + ? ? ? + a 1 , n x n a 2 , 1 x 1 + a 2 , 2 x 2 + ? ? ? + a 2 , n x n ? ? ? a m , 1 x 1 + a m , 2 x 2 + ? ? ? + a m , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix} ?a1,1?x1?+a1,2?x2?+???+a1,n?xn?a2,1?x1?+a2,2?x2?+???+a2,n?xn????am,1?x1?+am,2?x2?+???+am,n?xn?? ?,
? x ? A x ? \nabla_{\vec x}A\vec x ?x ?Ax = [ ? A x ? ? x 1 ? A x ? ? x 2 ? ? ? ? A x ? ? x n ] \begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix} ??x1??Ax ??x2??Ax ?????xn??Ax ?? ?
= [ ? a 1 , 1 x 1 + a 1 , 2 x 2 + ? ? ? + a 1 , n x n ? x 1 ? a 2 , 1 x 1 + a 2 , 2 x 2 + ? ? ? + a 2 , n x n ? x 1 ? ? ? ? a m , 1 x 1 + a m , 2 x 2 + ? ? ? + a m , n x n ? x 1 ? a 1 , 1 x 1 + a 1 , 2 x 2 + ? ? ? + a 1 , n x n ? x 2 ? a 2 , 1 x 1 + a 2 , 2 x 2 + ? ? ? + a 2 , n x n ? x 2 ? ? ? ? a m , 1 x 1 + a m , 2 x 2 + ? ? ? + a m , n x n ? x 2 ? ? ? ? ? ? ? ? ? ? ? ? ? a 1 , 1 x 1 + a 1 , 2 x 2 + ? ? ? + a 1 , n x n ? x n ? a 2 , 1 x 1 + a 2 , 2 x 2 + ? ? ? + a 2 , n x n ? x n ? ? ? ? a m , 1 x 1 + a m , 2 x 2 + ? ? ? + a m , n x n ? x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix} ??x1??a1,1?x1?+a1,2?x2?+???+a1,n?xn???x2??a1,1?x1?+a1,2?x2?+???+a1,n?xn??????xn??a1,1?x1?+a1,2?x2?+???+a1,n?xn????x1??a2,1?x1?+a2,2?x2?+???+a2,n?xn???x2??a2,1?x1?+a2,2?x2?+???+a2,n?xn??????xn??a2,1?x1?+a2,2?x2?+???+a2,n?xn?????????????????x1??am,1?x1?+am,2?x2?+???+am,n?xn???x2??am,1?x1?+am,2?x2?+???+am,n?xn??????xn??am,1?x1?+am,2?x2?+???+am,n?xn??? ?
= [ a 1 , 1 a 2 , 1 ? ? ? a m , 1 a 1 , 2 a 2 , 2 ? ? ? a m , 2 ? ? ? ? ? ? ? ? ? ? ? ? a 1 , n a 2 , n ? ? ? a m , n ] \begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix} ?a1,1?a1,2????a1,n??a2,1?a2,2????a2,n???????????????am,1?am,2????am,n?? ?= A ? A^\top A?

二、对于所有 A ∈ R n × m A \in \mathbb{R^{n\times m}} ARn×m,都有 ? x ? x ? ? A = A \nabla_{\vec x} \vec x^\top A = A ?x ?x ?A=A

证明:设 A ( n , m ) A_{(n,m)} A(n,m)?= [ a 1 , 1 a 1 , 2 ? ? ? a 1 , m a 2 , 1 a 2 , 2 ? ? ? a 2 , m ? ? ? ? ? ? ? ? ? ? ? ? a n , 1 a n , 2 ? ? ? a n , m ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix} ?a1,1?a2,1????an,1??a1,2?a2,2????an,2???????????????a1,m?a2,m????an,m?? ?
x ? ? A \vec x^\top A x ?A=
[ a 1 , 1 x 1 + a 2 , 1 x 2 + ? ? ? + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ? ? ? + a n , 2 x n ? ? ? a 1 , m x 1 + a 2 , m x 2 + ? ? ? + a n , m x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \end{bmatrix} [a1,1?x1?+a2,1?x2?+???+an,1?xn??a1,2?x1?+a2,2?x2?+???+an,2?xn??????a1,m?x1?+a2,m?x2?+???+an,m?xn??],
? x ? x ? ? A \nabla_{\vec x}\vec x^\top A ?x ?x ?A= [ ? x ? ? A ? x 1 ? x ? ? A ? x 2 ? ? ? ? x ? ? A ? x n ] \begin{bmatrix}\frac{\partial \vec x^\top A}{\partial x_1}\\\frac{\partial \vec x^\top A}{\partial x_2}\\···\\ \frac{\partial \vec x^\top A}{\partial x_n}\end{bmatrix} ??x1??x ?A??x2??x ?A?????xn??x ?A?? ?
= [ ? a 1 , 1 x 1 + a 2 , 1 x 2 + ? ? ? + a n , 1 x n ? x 1 ? a 1 , 2 x 1 + a 2 , 2 x 2 + ? ? ? + a n , 2 x n ? x 1 ? ? ? ? a 1 , m x 1 + a 2 , m x 2 + ? ? ? + a n , m x n ? x 1 ? a 1 , 1 x 1 + a 2 , 1 x 2 + ? ? ? + a n , 1 x n ? x 2 ? a 1 , 2 x 1 + a 2 , 2 x 2 + ? ? ? + a n , 2 x n ? x 2 ? ? ? ? a 1 , m x 1 + a 2 , m x 2 + ? ? ? + a n , m x n ? x 2 ? ? ? ? ? ? ? ? ? ? ? ? ? a 1 , 1 x 1 + a 2 , 1 x 2 + ? ? ? + a n , 1 x n ? x n ? a 1 , 2 x 1 + a 2 , 2 x 2 + ? ? ? + a n , 2 x n ? x n ? ? ? ? a 1 , m x 1 + a 2 , m x 2 + ? ? ? + a n , m x n ? x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_1}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_1}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_2}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_2}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_n}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_n}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_n}\end{bmatrix} ??x1??a1,1?x1?+a2,1?x2?+???+an,1?xn???x2??a1,1?x1?+a2,1?x2?+???+an,1?xn??????xn??a1,1?x1?+a2,1?x2?+???+an,1?xn????x1??a1,2?x1?+a2,2?x2?+???+an,2?xn???x2??a1,2?x1?+a2,2?x2?+???+an,2?xn??????xn??a1,2?x1?+a2,2?x2?+???+an,2?xn?????????????????x1??a1,m?x1?+a2,m?x2?+???+an,m?xn???x2??a1,m?x1?+a2,m?x2?+???+an,m?xn??????xn??a1,m?x1?+a2,m?x2?+???+an,m?xn??? ?
= [ a 1 , 1 a 1 , 2 ? ? ? a 1 , m a 2 , 1 a 2 , 2 ? ? ? a 2 , m ? ? ? ? ? ? ? ? ? ? ? ? a n , 1 a n , 2 ? ? ? a n , m ] \begin{bmatrix} a_{1,1} & a_{1,2}&···&a_{1,m}\\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ···&···&···&···\\ a_{n,1}&a_{n,2}&···&a_{n,m} \end{bmatrix} ?a1,1?a2,1????an,1??a1,2?a2,2????an,2???????????????a1,m?a2,m????an,m?? ?= A A A

三、对于所有 A ∈ R n × n A \in \mathbb{R^{n\times n}} ARn×n,都有 ? x ? x ? ? A x ? = ( A + A ? ) x ? \nabla_{\vec x} \vec x^\top A \vec x = (A+A^\top)\vec x ?x ?x ?Ax =(A+A?)x

证明:设 A ( n , n ) A_{(n,n)} A(n,n)?= [ a 1 , 1 a 1 , 2 ? ? ? a 1 , n a 2 , 1 a 2 , 2 ? ? ? a 2 , n ? ? ? ? ? ? ? ? ? ? ? ? a n , 1 a n , 2 ? ? ? a n , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,n} \end{bmatrix} ?a1,1?a2,1????an,1??a1,2?a2,2????an,2???????????????a1,n?a2,n????an,n?? ?
x ? ? A \vec x^\top A x ?A= [ a 1 , 1 x 1 + a 2 , 1 x 2 + ? ? ? + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ? ? ? + a n , 2 x n ? ? ? a 1 , n x 1 + a 2 , n x 2 + ? ? ? + a n , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \end{bmatrix} [a1,1?x1?+a2,1?x2?+???+an,1?xn??a1,2?x1?+a2,2?x2?+???+an,2?xn??????a1,n?x1?+a2,n?x2?+???+an,n?xn??],
x ? ? A x ? \vec x^\top A \vec x x ?Ax = [ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ] \begin{bmatrix} \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j) \end{bmatrix} [i=1n?j=1n?(ai,j?xi?xj?)?],
? x ? x ? ? A x ? \nabla_{\vec x}\vec x^\top A \vec x ?x ?x ?Ax = [ ? ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ? x 1 ? ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ? x 2 ? ? ? ? ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ? x n ] \begin{bmatrix} \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_1} \\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_2} \\ ···\\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_n} \end{bmatrix} ??x1??i=1n?j=1n?(ai,j?xi?xj?)??x2??i=1n?j=1n?(ai,j?xi?xj?)?????xn??i=1n?j=1n?(ai,j?xi?xj?)?? ?= [ ∑ i = 1 n ( a i , 1 + a 1 , i ) x i ∑ i = 1 n ( a i , 2 + a 2 , i ) x i ? ? ? ∑ i = 1 n ( a i , n + a n , i ) x i ] \begin{bmatrix} \sum\limits_{i=1}^{n}(a_{i,1}+a_{1,i})x_i \\ \sum\limits_{i=1}^{n}(a_{i,2}+a_{2,i})x_i \\ ···\\ \sum\limits_{i=1}^{n}(a_{i,n}+a_{n,i})x_i \\ \end{bmatrix} ?i=1n?(ai,1?+a1,i?)xi?i=1n?(ai,2?+a2,i?)xi????i=1n?(ai,n?+an,i?)xi?? ?
= [ 2 a 1 , 1 a 1 , 2 + a 2 , 1 ? ? ? a 1 , n + a n , 1 a 2 , 1 + a 1 , 2 2 a 2 , 2 ? ? ? a 2 , n + a n , 2 ? ? ? ? ? ? ? ? ? ? ? ? a n , 1 + a 1 , n a n , 2 + a 2 , n ? ? ? 2 a n , n ] [ x 1 x 2 ? ? ? x n ] \begin{bmatrix} 2a_{1,1} & a_{1,2}+a_{2,1} & ···&a_{1,n}+a_{n,1} \\ a_{2,1}+a_{1,2} & 2a_{2,2} & ···&a_{2,n}+a_{n,2} \\ ···&···&···&···\\ a_{n,1}+a_{1,n} & a_{n,2}+a_{2,n} & ···&2a_{n,n} \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ ···\\ x_n \end{bmatrix} ?2a1,1?a2,1?+a1,2????an,1?+a1,n??a1,2?+a2,1?2a2,2????an,2?+a2,n???????????????a1,n?+an,1?a2,n?+an,2????2an,n?? ? ?x1?x2????xn?? ?= ( A + A ? ) x ? (A+A^\top)\vec x (A+A?)x

四、 ? x ? ∥ x ∥ 2 = ? x ? x ? ? x ? = 2 x ? \nabla_{\vec x} \Vert x \Vert ^2=\nabla_{\vec x}\vec x^\top\vec x = 2\vec x ?x ?x2=?x ?x ?x =2x

证明: ? x ? ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 ?x ?x2= ? x ? x 1 2 + x 2 2 + ? ? ? + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 ?x ?x12?+x22?+???+xnn? ?2= ? x ? x 1 2 + x 2 2 + ? ? ? + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n ?x ?x12?+x22?+???+xnn?= ? x ? x ? x \nabla_{\vec x}x^\top x ?x ?x?x
? x ? ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 ?x ?x2= ? x ? x 1 2 + x 2 2 + ? ? ? + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 ?x ?x12?+x22?+???+xnn? ?2= ? x ? x 1 2 + x 2 2 + ? ? ? + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n ?x ?x12?+x22?+???+xnn?= [ 2 x 1 2 x 2 ? ? ? 2 x n ] \begin{bmatrix} 2x_1\\ 2x_2\\ ···\\ 2x_n \end{bmatrix} ?2x1?2x2????2xn?? ?= 2 x 2x 2x

??同样,对于任何矩阵 X X X,都有 ? X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X ?X?XF2?=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。

五、对于任何矩阵 X X X,都有 ? X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X ?X?XF2?=2X

证明:设 X X X m × n m\times n m×n的矩阵 X = [ x 1 , 1 x 1 , 2 ? ? ? x 1 , n x 2 , 1 x 2 , 2 ? ? ? x 2 , n ? ? ? ? ? ? ? ? ? ? ? ? x m , 1 x m , 2 ? ? ? x m , n ] X = \begin{bmatrix} x_{1,1}& x_{1,2}&···&x_{1,n}\\ x_{2,1}& x_{2,2}&···&x_{2,n}\\ ···&···&···&···\\ x_{m,1}& x_{m,2}&···&x_{m,n}\\ \end{bmatrix} X= ?x1,1?x2,1????xm,1??x1,2?x2,2????xm,2???????????????x1,n?x2,n????xm,n?? ?, ∥ X ∥ F 2 \Vert X \Vert_F^2 XF2?= ∑ i = 1 m ∑ j = 1 n x i , j 2 \sqrt{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2} i=1m?j=1n?xi,j2? ?,
? X ∥ X ∥ F 2 \nabla_X \Vert X \Vert_F^2 ?X?XF2?= [ 2 x 1 , 1 2 x 1 , 2 ? ? ? 2 x 1 , n 2 x 2 , 1 2 x 2 , 2 ? ? ? 2 x 2 , n ? ? ? ? ? ? ? ? ? ? ? ? 2 x m , 1 2 x m , 2 ? ? ? 2 x m , n ] \begin{bmatrix} 2x_{1,1}& 2x_{1,2}&···&2x_{1,n}\\ 2x_{2,1}& 2x_{2,2}&···&2x_{2,n}\\ ···&···&···&···\\ 2x_{m,1}& 2x_{m,2}&···&2x_{m,n}\\ \end{bmatrix} ?2x1,1?2x2,1????2xm,1??2x1,2?2x2,2????2xm,2???????????????2x1,n?2x2,n????2xm,n?? ?= 2 X 2X 2X

初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论

文章来源:https://blog.csdn.net/zhangjiuding/article/details/135500038
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。