Notations
Def. Max Entropy(ME)
max
?
P
∈
P
H
(
X
)
E
P
(
f
)
=
E
P
~
(
f
)
????????????
(
?
)
\max_{P\in \mathcal{P}} H(X)\\ E_P(f)=E_{\tilde{P}}(f) ~~~~~~~~~~~~(\star)
P∈Pmax?H(X)EP?(f)=EP~?(f)????????????(?)
where
f
(
x
)
f(x)
f(x) are features.
Fact. P w ( x ) ~ e ∑ j w j f j ( x ) P_w(x)\sim e^{\sum_jw_jf_j(x)} Pw?(x)~e∑j?wj?fj?(x) is the solution to inf ? P L ( P , w ) \inf_P L(P, w) infP?L(P,w), where L L L is the Laplacian of ( ? \star ?), w w w is the Lagrange multiplier.
Laplacian function: the likelihood of the sample,
Ψ
(
w
)
:
=
L
(
P
w
,
w
)
=
?
ln
?
Z
(
w
)
+
E
P
~
(
f
)
w
=
∑
i
ln
?
P
w
(
x
i
)
\Psi(w):=L(P_w,w)\\ =-\ln Z(w)+E_{\tilde{P}}(f)w\\ =\sum_i \ln P_w(x_i)
Ψ(w):=L(Pw?,w)=?lnZ(w)+EP~?(f)w=i∑?lnPw?(xi?)
Dual problem: Max. likelihood estimation(MLE)
max
?
w
Ψ
(
w
)
\max_w \Psi(w)
wmax?Ψ(w)
where
Ψ
(
w
)
:
=
∑
i
ln
?
p
(
x
i
)
\Psi(w):= \sum_{i} \ln p(x_i)
Ψ(w):=∑i?lnp(xi?).
Fact. the dual of ME( ? \star ?) is MLE( ? ? \star\star ??).
Assume that P ( Y ∣ X ) P(Y|X) P(Y∣X) is a determinative model.
Def. Max Entropy for
P
(
Y
∣
X
)
P(Y|X)
P(Y∣X)
max
?
P
∈
P
H
(
Y
∣
X
)
E
P
(
f
)
=
E
P
~
(
f
)
\max_{P\in \mathcal{P}} H(Y|X)\\ E_P(f)=E_{\tilde{P}}(f)
P∈Pmax?H(Y∣X)EP?(f)=EP~?(f)
where
f
(
x
,
y
)
f(x,y)
f(x,y) are features, and
P
(
y
∣
x
)
=
P
~
(
x
)
P
(
y
∣
x
)
P(y|x)=\tilde{P}(x)P(y|x)
P(y∣x)=P~(x)P(y∣x)
Fact. P w ( y ∣ x ) ~ e ∑ j w j f j ( x , y ) P_w(y|x)\sim e^{\sum_jw_jf_j(x,y)} Pw?(y∣x)~e∑j?wj?fj?(x,y) is the solution to max L ( P , w ) L(P, w) L(P,w).
Laplacian function: Ψ ( w ) : = L ( P w , w ) = ∑ i ln ? P w ( y i ∣ x i ) \Psi(w):=L(P_w,w)=\sum_i \ln P_w(y_i|x_i) Ψ(w):=L(Pw?,w)=∑i?lnPw?(yi?∣xi?), the conditional likelihood.
Dual problem (conditional MLE):
max
?
w
Ψ
(
w
)
\max_w \Psi(w)
wmax?Ψ(w)
where
Ψ
(
w
)
:
=
∑
i
ln
?
p
(
y
i
∣
x
i
)
\Psi(w):= \sum_{i} \ln p(y_i|x_i)
Ψ(w):=∑i?lnp(yi?∣xi?).
Exercise
plz consider ME for the generative model
P
(
X
,
Y
)
P(X,Y)
P(X,Y)