|
|
|
|
|
|
|
bit-strings, there are no consistency problems in the internal processing. Consistency problems do arise at the effectors; when different, simultaneous messages urge an effector to take mutually exclusive actions, they are resolved by competition. |
|
|
|
|
|
|
|
|
Competition plays a central role in determining just which rules are active at any given time. To provide a computational basis for the competition, each rule is assigned a quantity, called its strength,that summarizes its average past usefulness to the system. We will see shortly that the strength is automatically adjusted by a credit assignment algorithm, as part of the learning process. Competition allows rules to be treated as hypotheses, more or less confirmed, rather than as incontrovertible facts. The strength of a rule corresponds to its level of confirmation; stronger rules are more likely to win the competition when their conditions are satisfied. Stated another way, the classifier system's reliance upon a rule is based upon the rule's average usefulness in the contexts in which it has been tried previously. Competition also provides a means of resolving conflicts when effectors receive contradictory messages. |
|
|
|
|
|
|
|
|
A rule, then, enters a competition to post its message any time its conditions are satisfied. The actual competition is based on a bidding process. Each satisfied rule makes a bid based upon its strength and its specificity. In its simplest form, the bid for a rule r of strength s(r)would be |
|
|
|
|
|
|
|
|
where c is a constant <1, say 1/10. A rule that both has been useful to the system in the past (high strength) and uses more information about the current situation (high specificity) thus makes a higher bid. Rules making higher bids are favored in the competition. Various criteria for winning can be employed. For example, the probability of winning can be based on the size of the bid, or all rules making bids at least equal to the average bid can be declared winners. Usually there are several winners, so that parallelism is exploited. |
|
|
|
|
|
|
|
|
This completes the description of the performance part of the system; we are now ready to discuss the system's learning procedures. There are two basic problems, credit assignment,already mentioned, and rule discovery. Credit assignment rates the rules the system already has. Rule discovery replaces rules of low strength and provides new rules when environmental situations are ill-handled. |
|
|
|
|
|
|
|
|
Let us begin with the credit assignment problem. Credit assignment is not particularly difficult where the situation provides immediate reward or precise information about |
|
|
|
|
|