< previous page page_27 next page >

Page 27
to substitute the expected payoff under C0040-08.gif(t), C0043-07.gif, for C0043-11.gif). (If C0021-03.gif is countable, C0043-07.gif is simply given by C0043-08.gif where C0043-12.gif is the probability of selecting C0043-13.gif when the distribution over C0021-03.gif is C0043-14.gif.) Thus, for stochastic adaptive plans,
C0043-01.gif
Following this line, a useful performance target can be formulated in terms of the greatest possible cumulative payoff in the first T time-steps,
C0043-02.gif
An important criterion, appearing frequently in the literature of control theory and mathematical economics (see chapter 3, "Illustrations"), can be concisely formulated in terms of C0043-10.gif:t accumulates payoff at an asymptotic optimal rate if
C0043-03.gif
In other words, the rate at which t accumulates payoff is, in the limit, the same as the best possible rate. Often it is desirable to have a much stronger criterion setting standards on interim behavior. That is, even though the payoff rate approaches the optimum, it may take an intolerably long time before it is reasonably close. Thus, the stronger criterion sets a lower bound on the rate of approach to the optimum. For example, the criterion would designate a sequence C0043-06.gifapproaching 0 (such as C0043-15.gif, for 0 < j < ¥) and then require for all T
C0043-04.gif
Clearly the plan t satisfies the asymptotic optimal rate criterion when it satisfies this criterion and, in addition, t can approach that rate no more slowly than CTapproaches 0.
The simplest way to extend these criteria to e is to require that a plan C0043-16.gif meet the given criterion in each C0021-01.gif.
C0043-05.gif

 
< previous page page_27 next page >

If you like this book, buy it!