My Note of Maximum Entropy

发布时间:2023年12月25日

Note for Maximum Entropy

Notations

  • P P P: model distr.
  • P ~ \tilde{P} P~: empirical/sample distr. (Dirac distr.)
  • { x i } \{x_i\} {xi?}: sample
  • f j f_j fj?: features
  • E P f E_Pf EP?f: expectation under the distr. P P P

Maximum Entropy

Def. Max Entropy(ME)
max ? P ∈ P H ( X ) E P ( f ) = E P ~ ( f ) ???????????? ( ? ) \max_{P\in \mathcal{P}} H(X)\\ E_P(f)=E_{\tilde{P}}(f) ~~~~~~~~~~~~(\star) PPmax?H(X)EP?(f)=EP~?(f)????????????(?)
where f ( x ) f(x) f(x) are features.

Fact. P w ( x ) ~ e ∑ j w j f j ( x ) P_w(x)\sim e^{\sum_jw_jf_j(x)} Pw?(x)ej?wj?fj?(x) is the solution to inf ? P L ( P , w ) \inf_P L(P, w) infP?L(P,w), where L L L is the Laplacian of ( ? \star ?), w w w is the Lagrange multiplier.

Laplacian function: the likelihood of the sample,
Ψ ( w ) : = L ( P w , w ) = ? ln ? Z ( w ) + E P ~ ( f ) w = ∑ i ln ? P w ( x i ) \Psi(w):=L(P_w,w)\\ =-\ln Z(w)+E_{\tilde{P}}(f)w\\ =\sum_i \ln P_w(x_i) Ψ(w):=L(Pw?,w)=?lnZ(w)+EP~?(f)w=i?lnPw?(xi?)

Dual problem: Max. likelihood estimation(MLE)
max ? w Ψ ( w ) \max_w \Psi(w) wmax?Ψ(w)
where Ψ ( w ) : = ∑ i ln ? p ( x i ) \Psi(w):= \sum_{i} \ln p(x_i) Ψ(w):=i?lnp(xi?).

Fact. the dual of ME( ? \star ?) is MLE( ? ? \star\star ??).

ME for Machine learning/conditional likelihood

Assume that P ( Y ∣ X ) P(Y|X) P(YX) is a determinative model.

Def. Max Entropy for P ( Y ∣ X ) P(Y|X) P(YX)
max ? P ∈ P H ( Y ∣ X ) E P ( f ) = E P ~ ( f ) \max_{P\in \mathcal{P}} H(Y|X)\\ E_P(f)=E_{\tilde{P}}(f) PPmax?H(YX)EP?(f)=EP~?(f)
where f ( x , y ) f(x,y) f(x,y) are features, and P ( y ∣ x ) = P ~ ( x ) P ( y ∣ x ) P(y|x)=\tilde{P}(x)P(y|x) P(yx)=P~(x)P(yx)

Fact. P w ( y ∣ x ) ~ e ∑ j w j f j ( x , y ) P_w(y|x)\sim e^{\sum_jw_jf_j(x,y)} Pw?(yx)ej?wj?fj?(x,y) is the solution to max L ( P , w ) L(P, w) L(P,w).

Laplacian function: Ψ ( w ) : = L ( P w , w ) = ∑ i ln ? P w ( y i ∣ x i ) \Psi(w):=L(P_w,w)=\sum_i \ln P_w(y_i|x_i) Ψ(w):=L(Pw?,w)=i?lnPw?(yi?xi?), the conditional likelihood.

Dual problem (conditional MLE):
max ? w Ψ ( w ) \max_w \Psi(w) wmax?Ψ(w)
where Ψ ( w ) : = ∑ i ln ? p ( y i ∣ x i ) \Psi(w):= \sum_{i} \ln p(y_i|x_i) Ψ(w):=i?lnp(yi?xi?).


Exercise
plz consider ME for the generative model P ( X , Y ) P(X,Y) P(X,Y)

文章来源:https://blog.csdn.net/nbu2004/article/details/135209717
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。