Honing, H., & Romeijn, J.W. (2011). Surprise! Assessing the value of risky predictions. Proceedings of the thirteenth International Conference on Informatics and Semiotics in Organisations - Problems and Possibilities of Computational Humanities (Fryske Academy, series FA 1053), (p. 185). Leeuwarden: Fryske Academy.


While for most scientists and scholars the limitations of evaluating a theory by showing a good fit with the empirical data are clear-cut, a recent discussion (cf. Honing, 2006) shows that this method is still (or again) in the centre of scientific debate. In this paper we will discuss the role of theory appraisal in the computational humanities, by focussing on a ‘measure of surprise’.

The philosophical literature on theory appraisal and the literature on statistical model selection have long since nuanced the decisiveness of good fit. A fit between model and data is considered one aspect of a good model, but other aspects come into play as well. Within philosophy many different additions to a goodness-of-fit measure have been suggested, including proximity to the hypothesized truth, specificity, explanatory power, and the like. In statistics most model selection tools incorporate the simplicity of the model next to its fit. An alternative approach is to prefer theories that predict unexpected and hence surprising phenomena, based on the common intuition that the validity of a theory should increase when it correctly predicts an unlikely event, rather than when it correctly predicts something that was expected anyway. The present paper aims to make precise these intuitions on the role of surprise in theory appraisal. It deals with two problems in particular. First, how can we measure the amount of surprise that is inherent to a theory’s predictions? And second, how does surprise impact on the selection of one model over another? We will approach these problems by a philosophical run-up, detailing how Bayesian confirmation theory deals with part of these problems. The core of the paper concerns the use of recently developed model selection tools that focus on an aspect of models that arguably captures surprise (cf. Romeijn 2011). Finally, we bring the results to bear on actual problems of model selection in the domain of cognitive and computational musicology.

The eventual objective is to analyse and quantify the role of surprise in the confirmation of theories in computational form by taking into account the surprise following unexpected empirical findings, as well as the surprise stemming from unforeseen empirical consequences of the models. We will argue that a ‘measure of surprise’ might be a fruitful way to compare and evaluate theories without extensive, or else with only partial empirical support.

Honing, H. (2006) Computational modeling of music cognition: a case study on model selection. Music Perception, 23, 365-376.
Romeijn, J.W. (2011). One size does not fit all: derivation of a prior-adapted BIC, in Probabilities, Laws, and Structures, ed. by D. Dieks, W. Gonzales, S. Hartmann, F. Stadler, T. Uebel, and M. Weber. Berlin: Springer.