< previous page page_55 next page >

Page 55
neous cost rate C0071-03.gif.Typically, the cost function is derived from an explicit control objective such as attainment of a target state or a target region in minimal time or minimization of cumulative error. (Error is defined in terms of a measure of distance imposed on the phase space; the distance of the current state from the target region is the current error.) Control is thus a continuing search in phase space for the (usually moving) target or goalas such the considerations of the preceding illustration are directly relevant. In the formulation of the pursuit problem stated above a natural measure of the cost of pursuit over some interval T would be the change in distance between target and pursuer divided by the fuel expenditure (with suitable conventions for trajectories where the distance does not decrease).
Although the controlled process is defined above in terms of continuous functions, discrete finite-state versions closely approximating the continuous version almost always exist. Indeed, if the problem is to be solved with the help of a digital computer, it must be put in finite-state form. Because the framework we are using is discrete, we will reformulate the problem in discrete form. The law of motion is given by
C0071-01.gif
and the cumulative cost for a given trajectory over T units of time is given by
C0071-02.gif
If we look at the controlled process in the C0113-01.gif framework we see that the law of motion f determines the environment of the adaptive system. A problem in control becomes a problem of adaptation when there is significant uncertainty about the law of motion f;that is, it is only known that C0071-08.gif. Such problems are generally unsolvable by contemporary methods of optimal control theory (cf., for example, the comments of Tsypkin [1971, p. 178]). Clearly under such circumstances the adaptive plan will have to try out various policies in an attempt to determine a good one. To fix ideas, let us assume that each policy 1AΠC0039-11.gif can be assigned an average or expected performance C0071-04.gif for each possible f. Moreover let us assume that this average can be estimated as closely as desired by simply trying 1A long enough from any arbitrary time t onward. The object then is to search for the policy in C0039-11.gif with the best average performance C0071-05.gif, exploiting the best among known possibilities at each step along the way.
A control policy C0071-06.gif generates a sequence of control parameters C0071-07.gif. Different trials of the policy 1A, say at times t1, t2, . . ., tk, will in general elicit different costs Q(t1), Q(t2), . . ., Q(tk). However, the C0113-01.gif framework requires

 
< previous page page_55 next page >

If you like this book, buy it!