Regret to the Best vs. Regret to the Average

Michael Kearns
University of Pennsylvania

Monday, December 11th at 3:30 in AKW 500

ABSTRACT
Beginning at least as early as the 1950s, the long and still growing literature on no-regret learning establishes the following type of result: On any sequence of T trials in which the predictions of K experts are observed, it is possible to maintain a dynamically weighted prediction whose cumulative regret to the best single expert in hindsight (that is, after the full sequence has been revealed) is only O(\sqrt(T)), with absolutely no statistical assumptions on the data. The algorithms achieving such remarkable guarantees generally agressively reward winning experts and punish losing experts, often via multiplicative updating. Is there any price paid for such agressive updating? In this work we simultaneously analyze an algorithm's regret to the best expert and its regret to a more modest benchmark --- the average regret of the experts, which can be achieved with no learning at all by simply leaving the weights uniform. It would be natural to hope for algorithms that can simultaneously achieve regret O(\sqrt(T)) to the best and little or no regret to the average, thus providing a kind of "safety net" on top of the traditional bounds. We describe the following striking sequence of results: * Any algorithm that achieves only O(\sqrt(T)) cumulative regret to the best expert must, in the worst case, suffer regret Omega(\sqrt(T)) to the average. * For a wide class of algorithms, including standard multiplicative update rules, the product of the cumulative regret to the best and the cumulative regret to the average is Omega(T), and this bound can be achieved by multiplicative updates. * We give a new algorithm that, for any small constant c > 0, achieves regret O(T^{1/2 + c}) to the best expert while suffering only constant regret to the average (that is, with no dependence on T). This algorithm works in phases that begin with rather conservative update rules, but gradually become more aggressive as the performance of the best expert pulls away from the crowd. This talk describes joint work with Eyal Even-Dar, Yishay Mansour, and Jenn Wortman. Return to DMTCS home page.

ABSTRACT



Beginning at least as early as the 1950s, the long and still
growing literature on no-regret learning establishes the
following type of result: On any sequence of T trials in
which the predictions of K experts are observed, it is
possible to maintain a dynamically weighted prediction whose
cumulative regret to the best single expert in *hindsight*
(that is, after the full sequence has been revealed) is only
O(\sqrt(T)), with absolutely no statistical assumptions on
the data. The algorithms achieving such remarkable guarantees
generally agressively reward winning experts and punish
losing experts, often via multiplicative updating. Is there any
price paid for such agressive updating?

In this work we simultaneously analyze an algorithm's
regret to the best expert and its regret to a more modest
benchmark --- the *average* regret of the experts, which
can be achieved with no learning at all by simply leaving
the weights uniform. It would be natural to hope for algorithms
that can simultaneously achieve regret O(\sqrt(T)) to the best
and little or no regret to the average, thus providing a
kind of "safety net" on top of the traditional bounds. We
describe the following striking sequence of results:

* Any algorithm that achieves only O(\sqrt(T)) cumulative
regret to the best expert must, in the worst case, suffer
regret Omega(\sqrt(T)) to the average.

* For a wide class of algorithms, including standard
multiplicative update rules, the product of the cumulative
regret to the best and the cumulative regret to the average
is Omega(T), and this bound can be achieved by multiplicative
updates.

* We give a new algorithm that, for any small constant
c > 0, achieves regret O(T^{1/2 + c}) to the best expert
while suffering only *constant* regret to the average
(that is, with no dependence on T). This algorithm works
in phases that begin with rather conservative update rules,
but gradually become more aggressive as the performance of
the best expert pulls away from the crowd.

This talk describes joint work with Eyal Even-Dar,
Yishay Mansour, and Jenn Wortman.

Return to DMTCS home page.

Regret to the Best vs. Regret to the Average

Michael Kearns University of Pennsylvania

Monday, December 11th at 3:30 in AKW 500

ABSTRACT

Michael Kearns
University of Pennsylvania