|
|
|
|
|
|
|
trials of x' that all of the trials have had outcomes exceeding µx. A fortiori their average will exceed µx with probability at least pN, even though µx' < µx.) Here the tradeoff between gathering information and exploiting it appears in its simplest terms. To see it in exact form let x(1)(N) name the random variable with the highest observed payoff rate (average per trial) after N trials and let x(2)(N) name the other random variable. For any number of trials n, 0 £ n £ N, allocated to x(2)(N) (and assuming overlapping distributions) there is a positive probability, q(N - n, n), that x(2)(N) is actually the random variable with the highest mean, max {µx, µx'}. The two possible sources of loss are: (1) The observed best x(1)(N) is really second best, whence the N- n trials given x(1)(N) incur an (expected) cumulative loss (N - n) |mx-mx'|. this occurs with probability q(N - n, n). (2) The observed best is in fact the best, whence the n trials given x(2)(N) incur a loss n ·|mx-mx'| this occurs with probability (1 - q(N - n, n)). The total expected loss for any allocation of n trials to x(2) and N - n trials to x(1) is thus |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
We shall soon see that, for n not too large, the first source of loss decreases as n increases because both N - n and q(N - n, n) decrease. At the same time the second source of loss increases. By making a tradeoff between the first and second sources of loss, then, it is possible to find for each N a value n*(N) for which the losses are minimized; i.e., |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
For the determination of n* let us assume that initially one random variable is as likely as the other to be best. (This would be the case for example if the two unbalanced coins referred to earlier have no identifying external characteristics and are positioned initially at random. More generally, the result is the same if the labels of the random variables are assigned at random. The proof of the theorem will indicate the modifications necessary for cases where one random variable is initially more likely than the other to be the best.) For convenience let us adopt the convention that x1 is the random variable with the highest mean and let µ1 be that mean; accordingly x2 is the other random variable with mean µ2£ µ1. (The observer, of course, does not know this.) Using these conventions we can now establish |
|
|
|
|
|
|
|
|
THEOREM 5.1: Given N trials to be allocated to two random variables, with means µ1 > µ2 and variances , respectively, the minimum expected loss results when the number of trials allocated x(2)(N) is |
|
|
|
|
|