|
|
|
|
|
|
|
neous cost rate .Typically, the cost function is derived from an explicit control objective such as attainment of a target state or a target region in minimal time or minimization of cumulative error. (Error is defined in terms of a measure of distance imposed on the phase space; the distance of the current state from the target region is the current error.) Control is thus a continuing search in phase space for the (usually moving) target or goalas such the considerations of the preceding illustration are directly relevant. In the formulation of the pursuit problem stated above a natural measure of the cost of pursuit over some interval T would be the change in distance between target and pursuer divided by the fuel expenditure (with suitable conventions for trajectories where the distance does not decrease). |
|
|
|
|
|
|
|
|
Although the controlled process is defined above in terms of continuous functions, discrete finite-state versions closely approximating the continuous version almost always exist. Indeed, if the problem is to be solved with the help of a digital computer, it must be put in finite-state form. Because the framework we are using is discrete, we will reformulate the problem in discrete form. The law of motion is given by |
|
|
|
|
|
|
|
|
and the cumulative cost for a given trajectory over T units of time is given by |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
If we look at the controlled process in the framework we see that the law of motion f determines the environment of the adaptive system. A problem in control becomes a problem of adaptation when there is significant uncertainty about the law of motion f;that is, it is only known that . Such problems are generally unsolvable by contemporary methods of optimal control theory (cf., for example, the comments of Tsypkin [1971, p. 178]). Clearly under such circumstances the adaptive plan will have to try out various policies in an attempt to determine a good one. To fix ideas, let us assume that each policy 1AÎ can be assigned an average or expected performance for each possible f. Moreover let us assume that this average can be estimated as closely as desired by simply trying 1A long enough from any arbitrary time t onward. The object then is to search for the policy in with the best average performance , exploiting the best among known possibilities at each step along the way. |
|
|
|
|
|
|
|
|
A control policy generates a sequence of control parameters . Different trials of the policy 1A, say at times t1, t2, . . ., tk, will in general elicit different costs Q(t1), Q(t2), . . ., Q(tk). However, the framework requires |
|
|
|
|
|