设
θ
:
R
n
→
(
?
∞
,
∞
]
\theta:R^n → (-\infty,\infty]
θ:Rn→(?∞,∞] 为闭正常凸函数,
F
:
R
n
→
R
n
F:R^n → R^n
F:Rn→Rn为向量
值和连续映射。广义变分不等式
(
G
V
I
)
(GVI)
(GVI)[9,19]形式如下
x
?
∈
R
n
,
θ
(
x
)
?
θ
(
x
?
)
+
(
x
?
x
?
)
T
F
(
x
?
)
≥
0
,
?
x
∈
R
n
.
x^*\in R^n,\theta(x)-\theta(x^*)+(x-x^*)^TF(x^*)\ge 0,\forall x\in R^n.
x?∈Rn,θ(x)?θ(x?)+(x?x?)TF(x?)≥0,?x∈Rn.
保留
∞
?
∞
=
∞
\infty -\infty=\infty
∞?∞=∞的算术运算。在本文中,我们关注
F
F
F是单调的情况,即
(
F
(
x
)
?
F
(
y
)
)
T
(
x
?
y
)
≥
0
(F(x) -F(y))^T(x -y)\ge 0
(F(x)?F(y))T(x?y)≥0,对于所有
x
,
y
∈
R
n
x, y \in R^n
x,y∈Rn。在本文中,我们对
M
G
V
I
(
θ
,
F
)
MGVI(\theta, F)
MGVI(θ,F)做了以下额外的假设:
(
a
)
(a)
(a)单调算子
F
F
F是
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz连续的,其中
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz 常数
L
>
0
L > 0
L>0。
(
b
)
(b)
(b)
M
G
V
I
(
θ
,
F
)
MGVI(\theta ,F)
MGVI(θ,F)的解集,表示为
Ω
?
\Omega^*
Ω?,非空。
各种凸优化问题可以表述为
M
G
V
I
s
MGVIs
MGVIs,例如最小绝对收缩和选择算子(LASSO,least absolute shrinkage and selection operator/
l
1
l_1
l1?正则化最小二乘)问题、基追问题[7]、基追去噪问题[5],和
D
a
n
t
z
i
g
Dantzig
Dantzig 选择器 [6],等等。此外,单调广义变分不等式(
M
G
V
I
MGVI
MGVI,monotone generalized variational inequality)包含经典单调变分不等式 (
M
V
I
MVI
MVI,monotone variational inequality)。通过设置
θ
=
δ
C
\theta = \delta _C
θ=δC?(
δ
C
\delta _C
δC?是
C
C
C的指标函数,如果是
x
∈
C
x \in C
x∈C,则等于
0
0
0,否则等于
∞
\infty
∞。),其中
C
?
R
n
C \subseteq R^n
C?Rn是一个非空闭凸集,则
M
G
V
I
MGVI
MGVI 就以下列形式简化为
M
V
I
MVI
MVI
x
?
∈
C
,
(
x
?
x
?
)
T
F
(
x
?
)
≥
0
,
?
x
∈
C
.
x^*\in C,(x-x^*)^TF(x^*)\ge 0,\forall x\in C.
x?∈C,(x?x?)TF(x?)≥0,?x∈C.
设
f
:
R
n
→
(
?
∞
,
∞
]
f: R^n → (?∞, ∞]
f:Rn→(?∞,∞]为闭正常凸函数,
f
f
f的邻近算子定义如下
P
r
o
x
f
:
x
?
a
r
g
m
i
n
{
f
(
y
)
+
1
2
∥
x
?
y
∥
2
:
y
∈
R
n
}
Prox_f:x \mapsto argmin\{f(y)+\frac{1}{2}\|x-y\|^2 : y \in R^n\}
Proxf?:x?argmin{f(y)+21?∥x?y∥2:y∈Rn}
设
f
:
R
n
→
(
?
∞
,
∞
]
f: R^n \rightarrow (-\infty, \infty]
f:Rn→(?∞,∞]为闭正常凸函数。那么接下来的三个条件是等价的:
1.
p
=
P
r
o
x
f
(
x
)
.
1. p = Prox_f (x).
1.p=Proxf?(x).
2.
x
?
p
∈
?
f
(
p
)
.
2. x - p \in \partial f(p).
2.x?p∈?f(p).
3.
3.
3. 对于所有
w
∈
R
n
w\in R^n
w∈Rn,都有
(
x
?
p
)
T
(
p
?
w
)
≥
f
(
p
)
?
f
(
w
)
.
(x-p)^T(p-w) \ge f(p) - f(w).
(x?p)T(p?w)≥f(p)?f(w).
对线性逆问题:
b
=
A
x
+
w
b=Ax+w
b=Ax+w,其中
b
∈
R
m
,
A
∈
R
m
×
n
b\in R^m,A\in R^{m\times n}
b∈Rm,A∈Rm×n已知,
w
w
w为噪音,求解
x
x
x。
让噪声最小,最小二乘法求解:
x
~
=
arg
?
min
?
x
∥
A
x
?
b
∥
2
\widetilde{x}=\mathop{\arg\min}\limits_{x}\|Ax-b\|^2
x
=xargmin?∥Ax?b∥2
若
m
=
n
m=n
m=n且
A
A
A非奇异,
x
=
A
?
1
y
x=A^{-1}y
x=A?1y。但是很多情况下,
A
A
A病态,用最小二乘法求解时,系统微小的扰动都会导致结果差别很大。为了求解病态线性系统的逆问题,可以使用
l
2
l_2
l2?正则化最小二乘(也叫Tikhonov regularization,Ridge regression)
x
~
=
arg
?
min
?
x
{
1
2
∥
A
x
?
b
∥
2
+
λ
∥
x
∥
2
}
\widetilde{x}=\mathop{\arg\min}\limits_{x}\{\frac{1}{2}\|Ax-b\|^2+\lambda\|x\|_2\}
x
=xargmin?{21?∥Ax?b∥2+λ∥x∥2?},
即
m
i
n
x
∈
R
n
f
(
x
)
:
=
{
1
2
∥
A
x
?
b
∥
2
+
λ
∥
x
∥
2
}
\mathop{min}\limits_{x\in R^n}f(x):=\{\frac{1}{2}\|Ax-b\|^2+\lambda\|x\|_2\}
x∈Rnmin?f(x):={21?∥Ax?b∥2+λ∥x∥2?}
而
l
1
l_1
l1?正则化最小二乘(Lasso,least absolute shrinkage and selection operator,最小绝对值收敛和选择算子)问题。
m
i
n
x
∈
R
n
f
(
x
)
:
=
{
1
2
∥
A
x
?
b
∥
2
+
λ
∥
x
∥
1
}
\mathop{min}\limits_{x\in R^n}f(x):=\{\frac{1}{2}\|Ax-b\|^2+\lambda\|x\|_1\}
x∈Rnmin?f(x):={21?∥Ax?b∥2+λ∥x∥1?}
对无约束问题
m
i
n
f
(
x
)
,
x
∈
R
n
minf(x),x\in R^n
minf(x),x∈Rn
梯度下降法
x
k
=
x
k
?
1
?
t
k
?
1
?
f
(
x
k
?
1
)
x_k=x_{k-1}-t_{k-1}\nabla f(x_{k-1})
xk?=xk?1??tk?1??f(xk?1?)
梯度下降法对可微函数可直接求微分,对不可微函数引入了次梯度(subgradient),次微分(subdifferential)概念
投影算子(projection,Pc)
P
c
=
P
r
o
x
t
k
δ
C
Pc=Prox_{t_k\delta _C}
Pc=Proxtk?δC??
x
k
=
P
c
(
x
k
?
1
?
t
k
?
1
?
f
(
x
k
?
1
)
)
x_k=Pc(x_{k-1}-t_{k-1}\nabla f(x_{k-1}))
xk?=Pc(xk?1??tk?1??f(xk?1?))
近端线性
x
k
=
arg
?
min
?
x
{
f
(
x
k
?
1
)
+
<
x
?
x
k
?
1
,
?
f
(
x
k
?
1
)
>
+
1
2
t
k
?
1
∥
x
?
x
k
?
1
∥
2
}
x_k=\mathop{\arg\min}\limits_{x}\{f(x_{k-1})+<x-x_{k-1},\nabla f(x_{k-1})>+\frac{1}{2t_{k-1}}\|x-x_{k-1}\|^2\}
xk?=xargmin?{f(xk?1?)+<x?xk?1?,?f(xk?1?)>+2tk?1?1?∥x?xk?1?∥2}
f
(
x
k
?
1
)
+
<
x
?
x
k
?
1
,
?
f
(
x
k
?
1
)
>
+
1
2
t
k
?
1
∥
x
?
x
k
?
1
∥
2
=
1
2
t
k
?
1
∥
x
?
(
x
k
?
1
?
t
k
?
1
?
f
(
x
k
?
1
)
)
∥
2
+
D
f(x_{k-1})+<x-x_{k-1},\nabla f(x_{k-1})>+\frac{1}{2t_{k-1}}\|x-x_{k-1}\|^2=\frac{1}{2t_{k-1}}\|x-(x_{k-1}-t_{k-1}\nabla f(x_{k-1}))\|^2+D
f(xk?1?)+<x?xk?1?,?f(xk?1?)>+2tk?1?1?∥x?xk?1?∥2=2tk?1?1?∥x?(xk?1??tk?1??f(xk?1?))∥2+D
D为常数
x
k
=
arg
?
min
?
x
{
1
2
t
k
∥
x
?
(
x
k
?
1
?
t
k
?
1
?
f
(
x
k
?
1
)
)
∥
2
}
x_k=\mathop{\arg\min}\limits_{x}\{\frac{1}{2t_k}\|x-(x_{k-1}-t_{k-1}\nabla f(x_{k-1}))\|^2\}
xk?=xargmin?{2tk?1?∥x?(xk?1??tk?1??f(xk?1?))∥2}
对无约束问题
m
i
n
f
(
x
)
+
g
(
x
)
,
x
∈
R
n
minf(x)+g(x),x\in R^n
minf(x)+g(x),x∈Rn
x
k
+
1
=
arg
?
min
?
x
{
f
(
x
k
)
+
<
x
?
x
k
,
?
f
(
x
k
)
>
+
1
2
t
k
∥
x
?
x
k
∥
2
+
g
(
x
)
}
x_{k+1}=\mathop{\arg\min}\limits_{x}\{f(x_k)+<x-x_k,\nabla f(x_k)>+\frac{1}{2t_k}\|x-x_k\|^2+g(x)\}
xk+1?=xargmin?{f(xk?)+<x?xk?,?f(xk?)>+2tk?1?∥x?xk?∥2+g(x)}
x
k
+
1
=
arg
?
min
?
x
{
1
2
∥
x
?
(
x
k
?
t
k
?
f
(
x
k
)
)
∥
2
+
t
k
g
(
x
)
}
x_{k+1}=\mathop{\arg\min}\limits_{x}\{\frac{1}{2}\|x-(x_k-t_k\nabla f(x_k))\|^2+t_kg(x)\}
xk+1?=xargmin?{21?∥x?(xk??tk??f(xk?))∥2+tk?g(x)}
即
x
k
+
1
=
P
r
o
x
t
k
g
(
x
k
?
t
k
?
f
(
x
k
)
)
x_{k+1}=Prox_{t_kg}(x_k-t_k\nabla f(x_k))
xk+1?=Proxtk?g?(xk??tk??f(xk?))
m
i
n
∥
x
∥
1
s
.
t
.
A
x
=
b
\mathop{min}\quad \|x\|_1 \\s.t. \quad Ax=b
min∥x∥1?s.t.Ax=b
其中
A
∈
R
m
×
n
,
b
∈
R
n
A\in R^{m\times n},b\in R^{n}
A∈Rm×n,b∈Rn
m
i
n
θ
1
(
x
)
+
θ
2
(
x
)
s
.
t
.
A
x
+
B
y
=
c
\mathop{min}\quad\theta_1(x)+\theta_2(x)\\ s.t. \quad Ax+By=c
minθ1?(x)+θ2?(x)s.t.Ax+By=c
其中
A
∈
R
m
×
n
,
B
∈
R
m
×
q
,
c
∈
R
m
A\in R^{m\times n},B\in R^{m\times q},c\in R^m
A∈Rm×n,B∈Rm×q,c∈Rm,
θ
1
:
R
n
→
(
?
∞
,
∞
]
\theta_1:R^n\rightarrow(-\infty,\infty]
θ1?:Rn→(?∞,∞]和
θ
2
:
R
q
→
(
?
∞
,
∞
]
\theta_2:R^q\rightarrow(-\infty,\infty]
θ2?:Rq→(?∞,∞]为两个正常闭和凸函数
取
f
(
x
)
=
1
2
∥
A
x
?
b
∥
2
,
g
(
x
)
=
λ
∥
x
∥
1
f(x)=\frac{1}{2}\|Ax-b\|^2,g(x)=\lambda \|x\|_1
f(x)=21?∥Ax?b∥2,g(x)=λ∥x∥1?
?
f
(
x
k
)
=
A
T
(
A
x
?
b
)
,
t
k
=
1
L
k
\nabla f(x_k)=A^T(Ax-b),t_k=\frac{1}{L_k}
?f(xk?)=AT(Ax?b),tk?=Lk?1?
得
x
k
+
1
=
P
r
o
x
1
L
k
λ
∥
∥
1
(
x
k
?
1
L
k
A
T
(
A
x
k
?
b
)
)
x_{k+1}=Prox_{\frac{1}{L_k}\lambda \|\|_1}(x_k-\frac{1}{L_k}A^T(Ax_k-b))
xk+1?=ProxLk?1?λ∥∥1??(xk??Lk?1?AT(Axk??b))
L
L
L是
?
f
(
x
)
\nabla f(x)
?f(x)的
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz常数,
L
>
λ
m
a
x
(
A
T
A
)
L>\lambda_{max}(A^TA)
L>λmax?(ATA)
由
T
λ
t
k
=
P
r
o
x
t
k
λ
∥
∥
1
=
P
r
o
x
λ
t
k
\Tau _{\lambda t_k}=Prox_{t_k\lambda\|\|_1}=Prox_{\lambda t_k}
Tλtk??=Proxtk?λ∥∥1??=Proxλtk??得
x
k
+
1
=
T
λ
L
k
(
x
k
?
1
L
k
A
T
(
A
x
k
?
b
)
)
x_{k+1}=T_{\frac{\lambda}{L_k}}(x_k-\frac{1}{L_k}A^T(Ax_k-b))
xk+1?=TLk?λ??(xk??Lk?1?AT(Axk??b))
收缩阈值算子(Shrinkage-Thresholding operator )
T
λ
(
x
)
=
s
g
n
(
x
)
m
a
x
(
∣
x
∣
?
λ
,
0
)
=
m
a
x
(
0
,
x
?
λ
)
+
m
i
n
(
0
,
x
+
λ
)
\Tau _{\lambda}(x)=sgn(x)max(|x|-\lambda,0)=max(0, x -\lambda) + min(0, x +\lambda)
Tλ?(x)=sgn(x)max(∣x∣?λ,0)=max(0,x?λ)+min(0,x+λ)
x
k
x_k
xk?收敛速度
O
(
1
t
)
O(\frac{1}{t})
O(t1?)
matlab核心代码
T = @(tau, x) max(0, x - tau) + min(0, x + tau);% 收缩阈值函数
x_new = T(lambda/L0, x_old - (1/L0)*A'*(A*x_old-b));
FISTA改进:
s
t
e
p
0
:
step\quad 0:
step0:
y
1
=
x
0
,
t
1
=
1
y_1=x_0,t_1=1
y1?=x0?,t1?=1
s
t
e
p
k
(
k
≥
1
)
:
step\quad k(k\ge1):
stepk(k≥1):
x
k
+
1
=
T
λ
L
k
(
y
k
?
1
L
k
A
T
(
A
x
k
?
b
)
)
x_{k+1}=T_{\frac{\lambda}{L_k}}(y_k-\frac{1}{L_k}A^T(Ax_k-b))
xk+1?=TLk?λ??(yk??Lk?1?AT(Axk??b))
t
k
+
1
=
1
+
1
+
4
t
k
2
2
t_{k+1}=\frac{1+\sqrt{1+4t_k^2}}{2}
tk+1?=21+1+4tk2???
y
k
+
1
=
x
k
+
(
t
k
?
1
t
k
+
1
)
(
x
k
?
x
k
?
1
)
y_{k+1}=x_k+(\frac{t_k-1}{t_{k+1}})(x_k-x_{k-1})
yk+1?=xk?+(tk+1?tk??1?)(xk??xk?1?)
x
k
x_k
xk?收敛速度
O
(
1
t
2
)
O(\frac{1}{t^2})
O(t21?)
广义外梯度法(Generalized Extragradient Method)
由外梯度法(EM,Extragradient Method)推广得到,其中EM采用投影算子(
P
c
Pc
Pc,projection operator),适用于求解
M
V
I
(
θ
,
F
,
C
)
MVI(\theta,F,C)
MVI(θ,F,C);GEM采用邻近算子(
P
r
o
x
Prox
Prox,proximity operator),适用于求解
M
G
V
I
(
θ
,
F
)
MGVI(\theta,F)
MGVI(θ,F),其他步骤基本相同,都是预估校正算法
l
1
l_1
l1?正则化最小二乘问题,由KKT条件,易知
θ
(
x
)
=
λ
∥
x
∥
1
,
F
(
x
)
=
A
T
(
A
x
?
b
)
\theta(x)=\lambda\|x\|_1,F(x)=A^T(Ax-b)
θ(x)=λ∥x∥1?,F(x)=AT(Ax?b)
β
k
≤
ν
L
,
ν
∈
(
0
,
1
)
,
L
\beta^k\le\frac{\nu}{L},\nu\in(0,1),L
βk≤Lν?,ν∈(0,1),L为
F
(
x
)
F(x)
F(x)的
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz常数
GEM是预估校正算法
预测:
x
~
k
=
P
r
o
x
β
k
θ
(
x
k
?
β
k
F
(
x
k
)
)
\widetilde{x}_k=Prox_{\beta^k\theta}(x_k-\beta^kF(x_k))
x
k?=Proxβkθ?(xk??βkF(xk?))
校正:
x
k
+
1
=
P
r
o
x
β
k
θ
(
x
k
?
β
k
F
(
x
~
k
)
)
x_{k+1}=Prox_{\beta^k\theta}(x_k-\beta^kF(\widetilde{x}_k))
xk+1?=Proxβkθ?(xk??βkF(x
k?))
收缩阈值算子(
T
\Tau
T),投影算子(
P
C
P_C
PC?)是特殊的邻近算子(
P
r
o
x
Prox
Prox)
邻近算子具有收缩性质,也就是生成的序列
{
x
k
}
\{x_k\}
{xk?}会收敛
使用邻近算子(Prox)作为近似梯度
变分不等式
θ
(
x
~
k
)
?
θ
(
x
?
)
+
(
x
~
k
?
x
?
)
T
F
(
x
?
)
≥
0
\theta(\tilde{x}^k) -\theta(x^*) +(\tilde{x}^k - x^*)^T F(x^*) \ge 0
θ(x~k)?θ(x?)+(x~k?x?)TF(x?)≥0
单调
(
x
~
k
?
x
?
)
T
(
F
(
x
~
k
)
?
F
(
x
?
)
)
≥
0
(\tilde{x}^k - x^*)^T(F(\tilde{x}^k) -F(x^*))\ge 0
(x~k?x?)T(F(x~k)?F(x?))≥0
F
(
x
)
=
M
x
+
q
F(x)=Mx+q
F(x)=Mx+q
M半正定,不需要对称
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,并且
γ
∈
(
0
,
2
)
,
α
k
?
≥
1
∥
I
+
β
k
M
T
∥
2
2
\gamma \in (0,2),\alpha ^*_k\ge\frac{1}{\|I+\beta ^kM^T\|^2_2}
γ∈(0,2),αk??≥∥I+βkMT∥22?1?
预测器:
\textbf{预测器:}
预测器:选择
β
k
>
0
\beta ^k>0
βk>0,且
x
~
k
=
P
r
o
x
β
k
θ
(
x
k
?
β
k
F
(
x
k
)
)
\widetilde{x}^k=Prox_{\beta ^k\theta}(x^k-\beta ^kF(x^k))
x
k=Proxβkθ?(xk?βkF(xk));
校正器:
\textbf{校正器:}
校正器:设
d
(
x
k
,
x
~
k
,
β
k
)
=
(
I
+
β
k
M
T
)
(
x
k
?
x
~
k
)
d(x^k,\widetilde{x}^k,\beta ^k)=(I+\beta ^kM^T)(x^k-\widetilde{x}^k)
d(xk,x
k,βk)=(I+βkMT)(xk?x
k),并且计算
α
k
?
=
∥
x
k
?
x
~
k
∥
2
∥
d
(
x
k
,
x
~
k
,
β
k
)
∥
2
\alpha ^*_k=\frac{\|x^k-\widetilde{x}^k\|^2}{\|d(x^k,\widetilde{x}^k,\beta ^k)\|^2}
αk??=∥d(xk,x
k,βk)∥2∥xk?x
k∥2?。设置
x
k
+
1
=
x
k
?
γ
α
k
?
d
(
x
k
,
x
~
k
,
β
k
)
x^{k+1}=x^k-\gamma\alpha ^*_kd(x^k,\widetilde{x}^k,\beta ^k)
xk+1=xk?γαk??d(xk,x
k,βk)。
F
(
x
)
=
M
x
+
q
F(x)=Mx+q
F(x)=Mx+q
M对称半正定,
G
=
I
+
β
M
G=I+\beta M
G=I+βM
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
β
>
0
\beta >0
β>0,并且
γ
∈
(
0
,
2
)
\gamma \in (0,2)
γ∈(0,2),
α
k
?
≥
1
λ
m
a
x
(
G
)
\alpha ^*_k \ge \frac{1}{\lambda_{max}(G)}
αk??≥λmax?(G)1?
预测器:
x
~
k
=
P
r
o
x
β
θ
(
x
k
?
β
F
(
x
k
)
)
\textbf{预测器:}\widetilde{x}^k=Prox_{\beta\theta}(x^k-\beta F(x^k))
预测器:x
k=Proxβθ?(xk?βF(xk));
校正器:
\textbf{校正器:}
校正器:设
d
(
x
k
,
x
~
k
)
=
x
k
?
x
~
k
d(x^k,\widetilde{x}^k)=x^k-\widetilde{x}^k
d(xk,x
k)=xk?x
k,并且计算
α
k
?
=
∥
x
k
?
x
~
k
∥
2
∥
d
(
x
k
,
x
~
k
)
∥
G
2
\alpha ^*_k=\frac{\|x^k-\widetilde{x}^k\|^2}{\|d(x^k,\widetilde{x}^k)\|^2_G}
αk??=∥d(xk,x
k)∥G2?∥xk?x
k∥2?,设置
x
k
+
1
=
x
k
?
γ
α
k
?
d
(
x
k
,
x
~
k
)
x^{k+1}=x^k-\gamma\alpha ^*_kd(x^k,\widetilde{x}^k)
xk+1=xk?γαk??d(xk,x
k)。
F
(
x
)
F(x)
F(x)单调
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,并且
γ
∈
(
0
,
2
)
\gamma \in (0,2)
γ∈(0,2) ,
α
k
?
≥
1
2
\alpha ^*_k \ge\frac{1}{2}
αk??≥21?
预测器:
\textbf{预测器:}
预测器:选择
β
k
>
0
\beta ^k>0
βk>0,使得
β
k
∥
F
(
x
k
)
?
F
(
x
~
k
)
∥
≤
ν
∥
x
k
?
x
~
k
∥
\beta ^k\|F(x^k)-F(\widetilde{x}^k)\|\le\nu\|x^k-\widetilde{x}^k\|
βk∥F(xk)?F(x
k)∥≤ν∥xk?x
k∥,其中
x
~
k
=
P
r
o
x
β
k
θ
(
x
k
?
β
k
F
(
x
k
)
)
\widetilde{x}^k=Prox_{\beta ^k\theta}(x^k-\beta ^kF(x^k))
x
k=Proxβkθ?(xk?βkF(xk));
校正器:
\textbf{校正器:}
校正器:设置
d
(
x
k
,
x
~
k
,
β
k
)
=
(
x
k
?
x
~
k
)
?
β
k
(
F
(
x
k
)
?
F
(
x
~
k
)
)
d(x^k,\widetilde{x}^k,\beta ^k)=(x^k-\widetilde{x}^k)-\beta ^k(F(x^k)-F(\widetilde{x}^k))
d(xk,x
k,βk)=(xk?x
k)?βk(F(xk)?F(x
k)),并且计算
α
k
?
=
(
x
k
?
x
~
k
)
T
d
(
x
k
,
x
~
k
,
β
k
)
∥
d
(
x
k
,
x
~
k
,
β
k
)
∥
2
\alpha^*_k=\frac{(x^k-\widetilde{x}^k)^Td(x^k,\widetilde{x}^k,\beta ^k)}{\|d(x^k,\widetilde{x}^k,\beta ^k)\|^2}
αk??=∥d(xk,x
k,βk)∥2(xk?x
k)Td(xk,x
k,βk)?,设置
x
k
+
1
=
x
k
?
γ
α
k
?
d
(
x
k
,
x
~
k
,
β
k
)
x^{k+1}=x^k-\gamma\alpha^*_kd(x^k,\widetilde{x}^k,\beta ^k)
xk+1=xk?γαk??d(xk,x
k,βk)。
F
(
x
)
=
A
x
+
c
F(x)=Ax+c
F(x)=Ax+c
A对称半正定,
G
=
I
?
β
M
G=I-\beta M
G=I?βM,G对称正定
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
0
<
β
<
1
λ
m
a
x
(
A
)
0<\beta <\frac{1}{\lambda_{max}(A)}
0<β<λmax?(A)1?,并且
γ
∈
(
0
,
2
)
\gamma \in (0,2)
γ∈(0,2)
预测器:
x
~
k
=
P
r
o
x
β
θ
(
x
k
?
β
F
(
x
k
)
)
\textbf{预测器:}\widetilde{x}^k=Prox_{\beta\theta}(x^k-\beta F(x^k))
预测器:x
k=Proxβθ?(xk?βF(xk));
校正器:
\textbf{校正器:}
校正器:设
d
(
x
k
,
x
~
k
)
=
x
k
?
x
~
k
d(x^k,\widetilde{x}^k)=x^k-\widetilde{x}^k
d(xk,x
k)=xk?x
k,设置
x
k
+
1
=
x
k
?
γ
d
(
x
k
,
x
~
k
)
x^{k+1}=x^k-\gamma d(x^k,\widetilde{x}^k)
xk+1=xk?γd(xk,x
k)。
m
i
n
x
,
y
,
z
θ
1
(
x
)
+
θ
2
(
x
)
+
θ
3
(
x
)
min_{x,y,z}\theta_1(x)+\theta_2(x)+\theta_3(x)
minx,y,z?θ1?(x)+θ2?(x)+θ3?(x)
s
.
t
.
f
(
x
,
y
,
z
)
=
0
s.t. f(x,y,z)=0
s.t.f(x,y,z)=0
拉格朗日函数
L
(
x
,
y
,
z
,
λ
)
=
θ
1
(
x
)
+
θ
2
(
x
)
+
θ
3
(
x
)
+
λ
f
(
x
,
y
,
z
)
L(x,y,z,\lambda)=\theta_1(x)+\theta_2(x)+\theta_3(x)+\lambda f(x,y,z)
L(x,y,z,λ)=θ1?(x)+θ2?(x)+θ3?(x)+λf(x,y,z)
λ
\lambda
λ为拉格朗日乘子
增广拉格朗日函数
L
(
x
,
y
,
z
,
λ
)
=
θ
1
(
x
)
+
θ
2
(
x
)
+
θ
3
(
x
)
+
λ
f
(
x
,
y
,
z
)
+
ρ
2
∥
f
(
x
,
y
,
z
)
∥
2
2
L(x,y,z,\lambda)=\theta_1(x)+\theta_2(x)+\theta_3(x)+\lambda f(x,y,z)+\frac{\rho}{2} \|f(x,y,z)\|_2^2
L(x,y,z,λ)=θ1?(x)+θ2?(x)+θ3?(x)+λf(x,y,z)+2ρ?∥f(x,y,z)∥22?
λ
\lambda
λ为拉格朗日乘子,
ρ
\rho
ρ罚因子
m
i
n
x
,
y
,
z
θ
1
(
x
)
+
θ
2
(
x
)
min_{x,y,z}\theta_1(x)+\theta_2(x)
minx,y,z?θ1?(x)+θ2?(x)
s
.
t
.
A
x
+
B
y
=
c
s.t. Ax+By=c
s.t.Ax+By=c
初始化:
x
0
∈
R
n
\textbf{初始化:}x^0 \in R^n
初始化:x0∈Rn,
y
0
∈
R
q
y^0 \in R^q
y0∈Rq,
λ
0
∈
R
m
\lambda^0 \in R^m
λ0∈Rm,并且
ρ
>
0.
\rho>0.
ρ>0.
一般步骤:
\textbf{一般步骤:}
一般步骤:对
k
=
0
,
1
,
k=0,1,
k=0,1,…执行以下步骤:
(
a
)
x
k
+
1
∈
arg
?
min
?
{
θ
1
(
x
)
+
ρ
2
∥
A
x
+
B
y
k
?
c
+
1
ρ
λ
k
∥
2
:
x
∈
R
n
}
;
(a)x^{k+1}\in \mathop{\arg\min}\{\theta_1(x)+\frac{\rho}{2}\|Ax+By^k-c+\frac{1}{\rho}\lambda^k\|^2:x\in R^n\};
(a)xk+1∈argmin{θ1?(x)+2ρ?∥Ax+Byk?c+ρ1?λk∥2:x∈Rn};
(
b
)
y
k
+
1
∈
arg
?
min
?
{
θ
2
(
y
)
+
ρ
2
∥
A
x
k
+
1
+
B
y
?
c
+
1
ρ
λ
k
∥
2
:
x
∈
R
q
}
;
(b)y^{k+1}\in \mathop{\arg\min}\{\theta_2(y)+\frac{\rho}{2}\|Ax^{k+1}+By-c+\frac{1}{\rho}\lambda^k\|^2:x\in R^q\};
(b)yk+1∈argmin{θ2?(y)+2ρ?∥Axk+1+By?c+ρ1?λk∥2:x∈Rq};
(
c
)
λ
k
+
1
=
λ
k
+
ρ
(
A
x
k
+
1
+
B
y
k
+
1
?
c
)
.
(c)\lambda^{k+1}=\lambda^k+\rho(Ax^{k+1}+By^{k+1}-c).
(c)λk+1=λk+ρ(Axk+1+Byk+1?c).
交替方向近似乘子法(AD-PMM,proximal)的一个特例
初始化:
x
0
∈
R
n
\textbf{初始化:}x^0 \in R^n
初始化:x0∈Rn,
y
0
∈
R
q
y^0 \in R^q
y0∈Rq,
λ
0
∈
R
m
\lambda^0 \in R^m
λ0∈Rm,
ρ
>
0
\rho>0
ρ>0,
α
≥
ρ
λ
m
a
x
(
A
T
A
)
\alpha\ge\rho\lambda_{max}(A^TA)
α≥ρλmax?(ATA),
β
≥
ρ
λ
m
a
x
(
B
T
B
)
.
\beta\ge\rho\lambda_{max}(B^TB).
β≥ρλmax?(BTB).
一般步骤:
\textbf{一般步骤:}
一般步骤:对
k
=
0
,
1
,
k=0,1,
k=0,1,…执行以下步骤:
(
a
)
x
k
+
1
=
P
r
o
x
1
α
θ
1
[
x
k
+
ρ
α
A
T
(
A
x
k
+
B
y
k
?
c
+
1
ρ
λ
k
)
]
;
(a)x^{k+1}=Prox_{\frac{1}{\alpha}\theta_1}[x^k+\frac{\rho}{\alpha}A^T(Ax^k+By^k-c+\frac{1}{\rho}\lambda^k)];
(a)xk+1=Proxα1?θ1??[xk+αρ?AT(Axk+Byk?c+ρ1?λk)];
(
b
)
y
k
+
1
=
P
r
o
x
1
β
θ
2
[
y
k
+
ρ
β
B
T
(
A
x
k
+
1
+
B
y
k
?
c
+
1
ρ
λ
k
)
]
;
(b)y^{k+1}=Prox_{\frac{1}{\beta}\theta_2}[y^k+\frac{\rho}{\beta}B^T(Ax^{k+1}+By^k-c+\frac{1}{\rho}\lambda^k)];
(b)yk+1=Proxβ1?θ2??[yk+βρ?BT(Axk+1+Byk?c+ρ1?λk)];
(
c
)
λ
k
+
1
=
λ
k
+
ρ
(
A
x
k
+
1
+
B
y
k
+
1
?
c
)
.
(c)\lambda^{k+1}=\lambda^k+\rho(Ax^{k+1}+By^{k+1}-c).
(c)λk+1=λk+ρ(Axk+1+Byk+1?c).