< previous page page_177 next page >

Page 177
correct actions. Then the rules directly involved are simply strengthened. Credit assignment becomes difficult when credit must be assigned to early acting rules that set the stage,making possible later useful actions. Stage-setting moves are the key to success in complex situations, such as playing chess or investing resources. The problem is to credit an early action, which may look poor (such as the sacrifice of a piece in chess) but which sets the stage for later positive actions (such as the capture of a major piece in chess). When many rules are active simultaneously, the problem is exacerbated. It may be that only a few of the early acting rules contribute to a favorable outcome, while others, active at the same time, are ineffective or even obstructive. Somehow the credit assignment algorithm must sort this out, modifying rule strengths appropriately.
Credit assignment in classifier systems is based on competition. The bidding process mentioned earlier is treated as an exchange of "capital" (strength). That is, when a rule wins the competition, it actually "pays" its bid to the rule(s) that sent the message(s) satisfying its conditions. The rule acts as a kind of go-between or broker in a chain that leads from the stage-setting situation to the favorable outcome.
In a bit more detail, when a rule competes, its suppliers are those rules that have sent messages satisfying its conditions and its consumers are those rules that have conditions satisfied by its message. Under this regime, we treat the strength of a rule as capital and the bid as payment to its suppliers. When a rule wins, its bid is apportioned to its suppliers, increasing their strengths. At the same time, because the bid is treated as a payment for the right to post a message, the strength of the winning rule is reduced by the amount of its bid. Should a rule bid but not win, its strength is unchanged and its suppliers receive no payment. The resulting credit assignment procedure is called a bucket brigade algorithm (see Figure 16).
Winning rules can recoup their payments in two ways: (1) They, in turn, have winning consumers that make payments to them, or (2) they are active at a time when the system receives payoff from the environment. Case (2) is the sole way in which payoff from the environment affects the system. When payoff occurs, it is divided among the rules active at that instant, their strengths being increased accordingly. Rules not active at the time the payoff occurs do not share directly in that payoff. The system must rely on the bucket brigade algorithm to distribute the increased strength to the stage-setting rules, under repeated activations in similar situations.
The bucket brigade works because rules become strong only when they belong to sequences leading to payoff. To see this, first note that rules consistently active at times of payoff tend to become strong because of the payoff they receive from the environment. As these rules grow stronger, they make larger bids. A rule that "supplies" one of the payoff rules then benefits from these larger bids in future transactions.

 
< previous page page_177 next page >

If you like this book, buy it!