Gravina, D., Liapis, A., & Yannakakis, G. N. (2016). Surprise search: Beyond objectives and novelty. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM.

The problem with using only a fitness function (i.e. an explicit objective) for evolutionary algorithms: local optima.

In the context of evolutionary algorithms, deception describes when the combination of highly-fit components result in a solution further from the global optimum (towards a local optimum). Deception, in combination with sampling error and the ruggedness of a fitness landscape, are the main contributors to the difficulty of an evolutionary computation problem.

This problem can be mitigated by incorporating a novelty criteria - the dissimilarity of a new solution against existing solutions. This novelty search typically results in better solutions significantly more quickly than the conventional fitness-based approach.

The general approach involves keeping track of previous highly-novel solutions (a "novel archive"). The novelty of a solution is the average distance from either neighbors in the novel archive or in the current population. The particular distance measure is problem-dependent.

The paper proposes an additional criteria of surprise. Whereas novelty just requires that a solution be sufficiently different from existing ones, surprise also requires deviation from expectations.

One way of understanding the distinction is by viewing surprise as the time derivative of novelty; i.e. novelty is position, surprise is velocity.

To quantify the surprise of a solution, a surprise archive is maintained of the past $h$ generations and a predictive model $m$ is learned to generate expectations. A new solution is compared against the members $k$ (the level of prediction locality) groups (if $k=1$, compared to the entire population $P$, if $k=P$, compared to each individual of the population). $k$-mean sis used to form the population groups; each generation (except the first) uses the $k$ centroids from the previous generation. The expectations $p$ is a function of these, i.e. $p = m(h,k)$. Each population group is used to generate a expectation.

The surprise value $s$ of a solution $i$ is computed as the average distance to the $n$-nearest expectations:

$$ s(i) = \frac{1}{n} \sum_{j=0}^n d_s (i, p_{i,j}) $$

where $d_s$ is the domain-dependent measure of behavioral difference between an individual and its expectation and $p_{i,j}$ is the $j$-closest prediction point (expectation) to individual $i$.

see also: Liapis, A., Yannakakis, G. N., & Togelius, J. (2015). [Constrained novelty search: A study on game content generation]( Evolutionary computation, 23(1), 101-129.