Skip to main content
Browse by:

Effective March 10, 2020, all Duke-sponsored events over 50 people have been cancelled, rescheduled, postponed or virtualized.
Please check with the event contact regarding event status. For more information, please see

Minimax risk for missing mass estimation

Event Image
Icon calendar
Tuesday, June 26, 2018
Icon time
11:45 am - 1:15 pm
Icon speaker
Andrew Thangaraj, IIT Madras
Icon series
Information Theory

Suppose you observe a sequence of words. What is the chance that the
next word is unseen? This is essentially the problem of missing mass
estimation. It has a wide variety of applications including language
modelling and species estimation. The most popular estimator - called
the Good-Turing estimator - was discovered while Good and Turing were
trying to crack the enigma code in World War II.

We consider squared-error risk for the estimation of missing mass from N samples drawn independently from an unknown distribution. For the Good-Turing estimator, we show that the worst-case risk (over all distributions) is between 0.608/N and 0.618/N. Further, the worst-case risk for any estimator, i.e. the minimax risk, is bounded between 0.25/N and 0.618/N. We will allude to recent improvements that establish the exact worst-case risk for the Good-Turing estimator and a significant sharpening of the constant for the minimax risk.
(Joint work with Nikhilesh Rajaraman and Ananda Theertha Suresh)

Andrew Thangaraj is currently a Professor in the Department of Electrical Engineering, IIT Madras. His research interests are in the broad areas
of information theory, error-control coding and information-theoretic
aspects of cryptography.

Contact: Henry Pfister