Escolar Documentos
Profissional Documentos
Cultura Documentos
Mehryar Mohri
Courant Institute of Mathematical Sciences
mohri@cims.nyu.edu
This Lecture
Relative entropy
Maximum entropy models
Training algorithm
Duality theorem
• Properties:
x
−2m!2
Pr[X m − E[X m ] ≥ !] ≤ e .
−2m!2
Pr[X m − E[X m ] ≤ −!] ≤ e .
• dialog-level information.
Mehryar Mohri - Speech Recognition page 14 Courant Institute, NYU
Maximum Entropy Principle
(E. T. Jaynes, 1957, 1983)
For large m , we can give a fairly good estimate of
the expected values of the features:
1 !m
ED [fj ] ≈ fj (xi ), j = 1, . . . , n.
m i=1
Find distribution that is closest to the uniform
distribution u and that preserves the expected
values of features.
Closeness is measured using relative entropy (or
Kullback-Liebler divergence).
and p ∈ P.
!
Thus, p !
∈ P ∩ Q̄. By the duality theorem, = p! .
p !
!Parameter update:
(0)
λj ← arbitrary
(t+1) (t) (t)
λj ← λj + δj , ∀j ∈ [1, n + 1],
! λj + δ j
with (t)
p (x)fj (x)e δj f# (x)
+ 2 = Ep̂ [fj ].
σj
x∈X
Mehryar Mohri - Speech Recognition page 33 Courant Institute, NYU
This Lecture
Relative entropy
Maximum entropy models
Training algorithm
Duality theorem
implies D(p ! p! ) = 0 .
!
Examples:
• Euclidean distance.
F (x) + (y − x) · ∇x F (x) x y
Mehryar Mohri - Speech Recognition page 40 Courant Institute, NYU
References
• Adam Berger, Stephen Della Pietra, and Vincent Della Pietra. A maximum entropy
approach to natural language processing. Computational Linguistics, (22-1), March 1996;
• Berkson, J. (1944). Application of the logistic function to bio-assay. Journal of the American
Statistical Association 39, 357–365.
• Imre Csiszar and Tusnady. Information geometry and alternating minimization procedures.
Statistics and Decisions, Supplement Issue 1, 205-237, 1984.
• J. Darroch and D. Ratchliff. Generalized iterative scaling for log-linear models. The Annals
of Mathematical Statistics, 43(5), pp. 1470-1480, 1972.
• Stephen Della Pietra,Vincent Della Pietra, and John Lafferty. Inducing features of random
fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19:4, pp.380--393,
April, 1997.