|
|
|
|
|
|
|
to substitute the expected payoff under (t), , for ). (If is countable, is simply given by where is the probability of selecting when the distribution over is .) Thus, for stochastic adaptive plans, |
|
|
|
|
|
|
|
|
Following this line, a useful performance target can be formulated in terms of the greatest possible cumulative payoff in the first T time-steps, |
|
|
|
|
|
|
|
|
An important criterion, appearing frequently in the literature of control theory and mathematical economics (see chapter 3, "Illustrations"), can be concisely formulated in terms of :t accumulates payoff at an asymptotic optimal rate if |
|
|
|
|
|
|
|
|
In other words, the rate at which t accumulates payoff is, in the limit, the same as the best possible rate. Often it is desirable to have a much stronger criterion setting standards on interim behavior. That is, even though the payoff rate approaches the optimum, it may take an intolerably long time before it is reasonably close. Thus, the stronger criterion sets a lower bound on the rate of approach to the optimum. For example, the criterion would designate a sequence approaching 0 (such as , for 0 < j < ¥) and then require for all T |
|
|
|
|
|
|
|
|
Clearly the plan t satisfies the asymptotic optimal rate criterion when it satisfies this criterion and, in addition, t can approach that rate no more slowly than CTapproaches 0. |
|
|
|
|
|
|
|
|
The simplest way to extend these criteria to e is to require that a plan meet the given criterion in each . |
|
|
|
|
|