假设有k个簇,每一个簇服从高斯分布,以概率
π
k
\pi_k
πk?随机选择一个簇k ,从其分布中采样出一个样本点,如此得到观测数据
p
(
x
)
=
∑
k
=
1
K
π
k
N
(
x
∣
μ
k
,
Σ
k
)
?其中
∑
k
=
1
K
π
k
=
1
,
0
≤
π
k
≤
1
p\left(\boldsymbol{x}\right)=\sum_{k=1}^{K}\pi_{k}N(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\text{ 其中}\sum_{k=1}^{K}\pi_{k}=1,\quad0\leq\pi_{k}\leq1
p(x)=k=1∑K?πk?N(x∣μk?,Σk?)?其中k=1∑K?πk?=1,0≤πk?≤1
其中模型参数为:
θ
=
(
π
1
,
…
,
π
K
,
μ
1
,
…
,
μ
K
,
Σ
1
,
…
,
Σ
K
)
T
\theta=(\pi_1,\ldots,\pi_K,\mu_1,\ldots,\mu_K,\Sigma_1,\ldots,\Sigma_K)^\mathrm{T}
θ=(π1?,…,πK?,μ1?,…,μK?,Σ1?,…,ΣK?)T
若样本
x
x
x关联K维的隐含变量为
z
=
(
z
1
,
z
2
,
…
,
z
K
)
z=(z_1,z_2,\dots,z_K)
z=(z1?,z2?,…,zK?),其对应的随机向量用大写字母Z表示
P
(
Z
k
=
1
)
=
π
k
P(Z_k=1)=\pi_k
P(Zk?=1)=πk?
若
x
x
x属于第
k
k
k簇,则
p
(
x
∣
Z
k
=
1
)
=
N
(
x
∣
μ
k
,
Σ
k
)
p(x|Z_k=1)=N(x|\mu_k,\Sigma_k)
p(x∣Zk?=1)=N(x∣μk?,Σk?)
p ( x ) = ∑ z p ( x , z ) = ∑ z p ( x ∣ z ) p ( z ) = ∑ k = 1 K p ( x ∣ Z k = 1 ) P ( Z k = 1 ) = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) \begin{gathered} p(x)=\sum_{\mathbf{z}}p(x,\mathbf{z})=\sum_{\mathbf{z}}p(x|\mathbf{z})p(\mathbf{z})=\sum_{k=1}^{K}p(x|Z_{k}=1)P(Z_{k}=1) \\ =\sum_{k=1}^K\pi_kN(x|\mu_k,\Sigma_k) \end{gathered} p(x)=z∑?p(x,z)=z∑?p(x∣z)p(z)=k=1∑K?p(x∣Zk?=1)P(Zk?=1)=k=1∑K?πk?N(x∣μk?,Σk?)?
采用EM算法求解
Е步:基于当前参数值
θ
o
l
d
\theta^{old}
θold,推断隐含变量
z
i
z_i
zi?的信息(后验概率/期望)
r
i
,
k
=
E
(
Z
i
,
k
)
=
P
(
Z
i
,
k
=
1
∣
x
i
,
θ
o
l
d
)
=
P
(
Z
i
,
k
=
1
)
p
(
x
i
∣
Z
i
,
k
=
1
)
∑
k
′
=
1
K
P
(
Z
i
,
k
′
=
1
)
p
(
x
i
∣
Z
i
,
k
′
=
1
)
=
π
k
o
l
d
N
(
x
i
∣
μ
k
o
l
d
,
Σ
k
o
l
d
)
∑
k
′
=
1
K
π
k
′
o
l
d
N
(
x
i
∣
μ
k
′
o
l
d
,
Σ
k
′
o
l
d
)
\begin{gathered} r_{i,k}=\mathbb{E}\big(Z_{i,k}\big)=P\big(Z_{i,k}=1\big|x^{i},\boldsymbol{\theta^{old}}\big)=\frac{P\big(Z_{i,k}=1\big)p(\boldsymbol{x^{i}}|Z_{i,k}=1)}{\sum_{k^{\prime}=1}^{K}P\big(Z_{i,k^{\prime}}=1\big)p(\boldsymbol{x^{i}}|Z_{i,k^{\prime}}=1\big)} \\ =\frac{\pi_k^{old}\mathcal{N}(x^i|\boldsymbol{\mu}_k^{old},\boldsymbol{\Sigma}_k^{old})}{\sum_{k^{\prime}=1}^K\pi_{k^{\prime}}^{old}\mathcal{N}(\boldsymbol{x}^i|\boldsymbol{\mu}_{k^{\prime}}^{old},\boldsymbol{\Sigma}_{k^{\prime}}^{old})} \end{gathered}
ri,k?=E(Zi,k?)=P(Zi,k?=1
?xi,θold)=∑k′=1K?P(Zi,k′?=1)p(xi∣Zi,k′?=1)P(Zi,k?=1)p(xi∣Zi,k?=1)?=∑k′=1K?πk′old?N(xi∣μk′old?,Σk′old?)πkold?N(xi∣μkold?,Σkold?)??
r
i
,
k
r_{i,k}
ri,k?可以看做是对
x
i
x^i
xi从属于第
k
k
k个簇的一种估计
M步:基于当前的期望
r
i
,
k
r_{i,k}
ri,k?重新估计参数的值
π
k
\pi_k
πk?、
μ
k
\mu_k
μk?、
Σ
k
\Sigma_k
Σk?
π k n e w = ∑ i r i , k N , μ k n e w = ∑ i r i , k x i ∑ i r i , k , Σ k n e w = ∑ i r i , k ( x i ? μ k ) ( x i ? μ k ) T ∑ i r i , k \pi_k^{new}=\frac{\sum_ir_{i,k}}N,\quad\mu_k^{new}=\frac{\sum_ir_{i,k}\boldsymbol{x}^i}{\sum_ir_{i,k}},\quad\Sigma_k^{new}=\frac{\sum_ir_{i,k}(\boldsymbol{x}^i-\boldsymbol{\mu}_k)(\boldsymbol{x}^i-\boldsymbol{\mu}_k)^\mathrm{T}}{\sum_ir_{i,k}} πknew?=N∑i?ri,k??,μknew?=∑i?ri,k?∑i?ri,k?xi?,Σknew?=∑i?ri,k?∑i?ri,k?(xi?μk?)(xi?μk?)T?