< previous page page_42 next page >

Page 42
interest centers on the discovery of plans which enable a player to do well while learning to do better. If plans are compared in terms of accumulated payoff, a criterion emerges analogous to the classical "gambler's ruin" of elementary probability. Let UE(t, t) be the payoff accumulated to time t by plan C0058-02.gif confronting the (unknown) pure strategy C0021-01.gif, and require that
C0058-01.gif
That is, the payoff accumulated by t1 never falls to less than c of that accumulated by any other admissible plan C0058-02.gif, no matter what strategy the opponent chooses (even if that other plan by happenstance hits upon a good opposing strategy in its first trial). The smaller c is, the less stringent the criterion and, in general, the larger the number of plans satisfying the criterion. The usefulness of this criterion and the kinds of plans satisfying it will be discussed at length later (see especially the discussion of Samuel's algorithm in section 7.3); for now it is sufficient to notice that: (i) the criterion depends upon the accumulation function Ut,E(t), (ii) for a given opposing strategy E,the lower the efficiency of a plan in accumulating payoff in relation to other plans C0058-02.gif, the smaller c becomes, and (iii) the rating of a plan will be determined by its performance against the opposing strategy which gives it the most difficulty.
Even when it is known that the opponent has selected a single pure strategy, there is a wide range of sophistication of adaptive plans. One class of simpler plans, the payoff-only plans, proves to be quite instructive because it sets a nontrivial lower bound on the performance of more sophisticated plans and it can be analyzed in some detail. In this context, a payoff-only plan ranks strategies it has tried according to the payoff obtained, and it generates new trial strategies on the basis of (selected parts of) this information alone (see section 7.3). More sophisticated plans use the large amounts of information generated during plays of the game, information concerning configurations encountered and the sequence in which they occur (see chapter 8). Obviously a plan which makes proper use of this additional information should do no worse than a payoff-only plan (since the sophisticated plan can reduce its operation to that of a payoff-only plan by ignoring the additional information), and there are certainly situations in which the information will enable the plan to accumulate payoff at a greater rate than a payoff-only plan.
The other extreme from a fixed opposing pure strategy occurs when any sequential mix of strategies is presumed possible on the part of the opponent. The object then (following von Neumann 1947) is usually to minimize the maximum

 
< previous page page_42 next page >

If you like this book, buy it!