|
|
|
|
|
|
|
of f for relatively few x will enable fx to be estimated for a great many xÎX. Even a sequence of four observations, say x(1) = .0100010 . . . 0, x(2) = .110100 . . . 0, x(3) = .100010 . . . 0, x(4) = .1111010 . . . 0, enables one to calculate three-point estimates for many schemata, e.g. (assuming all points are equally likely or equally weighted), and  and two-point estimates for even more schemata, e.g., and . |
|
|
|
|
|
|
|
|
The picture is not much changed if f is a function of many variables x1, . . . , xd. Using binary representations again, we now have 20d detectors (assuming the same accuracy as before), 320d schemata, and each point is an instance of 220d schemata. In the one-dimensional case the representation transformed the problem to one of sampling in a 20-dimensional space-already a space of high dimensionalityso the increase to a 20d-dimensional space really involves no significant conceptual changes. Interestingly, each point (x1, . . ., xd) is now an instance of 220d schemata rather than 220 schemata, an exponential (dth power) increase. Thus, for a given number of points tried, we can expect an exponential (dth power) increase in the number of schemata for which fx can be estimated with a given confidence. As a consequence, if the information about the schemata can be stored and used to generate relevant new trials, high dimensionality of the argument space {0 £ xj < 1,j = 1, . . ., d} imposes no particular barrier. |
|
|
|
|
|
|
|
|
It is also interesting in this context to compare two different representations for the same underlying space. Six detectors with a range of 10 values can yield approximately the same number of distinct representations as 20 detectors with a range of 2 values, since 106@ 220 = 1.05 X 106 (cf. decimal encoding vs. binary encoding). However the numbers of schemata in the two cases are vastly different: 116 = 1.77 X 106 vs. 320 = 3.48 X 109. Moreover in the first case each is an instance of only 26 = 64 schemata, whereas in the second case each is an instance of 220 = 1.05 X 106 schemata. This suggests that, for adaptive plans which can use the increased information flow (such as the reproductive plans), many detectors deciding among few attributes are preferable to few detectors with a range of many attributes. In genetics this would correspond to chromosomes with many loci and few alleles per locus (the usual case) rather than few loci and many alleles per locus. |
|
|
|
|
|
|
|
|
Returning to the view of schemata as random variables, it is instructive to determine how many schemata receive at least some given number n < N of trials when N elements of are selected at random. This will give us a better idea of the intrinsic parallelism wherein a sequence of trials drawn from is at the same time a (usually shorter) sequence of trials for each of a large number of schemata |
|
|
|
|
|