Peter Desain and Henkjan Honing: Music, Mind and Machine: Studies in Computer Music, Music Cognition and Artificial Intelligence.Thesis Publishers, Amsterdam. 330 pages, ISBN 90-5170-149-7 softcover.
Reviewed by Roger B. Dannenberg, Carnegie Mellon University, Pittsburgh, PA for Music Perception 12(3), pp. 365-367.
Although the title describes the topics addressed by the book, no attempt is made to provide unified or comprehensive coverage. Instead, the topics are wide-ranging, and comprise at least several independent areas of investigation. Part of the enjoyment of reading the book is thinking about the common threads that weave their way through the various chapters. These have more to do with research style than research topics.
A key element of this research style is the desire to capture the essence of phenomena in elegant computational models. This approach helps to distill and clarify problems and their solutions. Another related thread is the use of numerical models, such as an "interaction function," intended to capture the tendency to quantize musical durations into simple ratios and the use of partial autocorrelation to study expressive timing. Another thread found in many of the chapters is the care with which proposed models are analyzed and studied, often using graphical visualizations of input/output relations.
Because many topics are covered in the book's 14 chapters, I will not list them all. Instead, I will list the major areas of study and describe some of the interesting results. The book has three major divisions: Perception, Representation, and Methodology. I will describe some themes that cut across these divisions.
Several chapters address the topic of expressive timing in music. "Tempo Curves Considered Harmful" introduces the idea that subtle performance timing variations relate to music structure. Simply stretching durations to achieve tempo transformations without regard for structure yields unmusical results. For example, grace notes should not change much in duration when the tempo is changed. At a slower tempo, more rubato within the span of a beat or two may be possible. Various solutions to this problem are surveyed.
Other chapters examine the problem of removing expressive timing, leaving notated durations as a residue. In some sense, this is the opposite problem to working with expressive timing, but an understanding of expressive timing could inform research on quantization. Similarly, quantizers can help to extract expressive timing information from a score. "Towards a Calculus for Expressive Timing in Music" attempts to perform timing transformations taking structure into account. Transformations are introduced that more or less independently manipulate tempo, onset asynchrony, overlap, duration, and proportion-articulation. For example, a performance could be slowed down without changing the duration of grace notes or the spread of rolled chords. Ironically, the mathematics of these transformations is based on interpolation and is subject to many of the criticisms voiced in "Tempo Curves Considered Harmful." The results are intriguing nonetheless, and they set the stage for further research.
Another theme is that of computer languages and music. "LOCO: A Composition Microworld in Logo" describes a language with temporal semantics allowing elements with time and duration to be assembled in parallel and sequential structures. An emphasis is placed on making programmed choices through a wide range of techniques such as random selection, sequential selection, probability distributions, and arithmetic progressions. "LISP as a Second Language: Functional Aspects" embeds similar ideas in a tutorial on LISP, although in this chapter the emphasis is on functional composition and recursion. Some elegant programs result, but I suspect beginners would be happier to learn fewer concepts at first.
The chapter "Time Functions Best as Functions of Multiple Times" extends these notions of time to support what I have termed behavioral abstraction, where a behavior such as "vibrato" "knows" how to behave when stretched or transformed in other ways. A functional approach is taken in which continuous control functions represent parameters such as vibrato and amplitude. These functions are attached to notes or to compound objects consisting of structured collections of notes. The attached functions have three parameters (the "multiple times" in the title) named start, duration, and progress, and return a real number. The additional parameters allow a range of behaviors to be represented by one function. If a musical object is stretched, then its attached control functions receive a larger duration parameter, and the resulting behavior can change accordingly. This can be seen as another approach to the expressive timing problem. Here, behavior can change in a non-linear way as a function of tempo or absolute starting time.
A third theme is the quantization problem: how can we extract beats and tempo from performance information? In "The Quantization of Musical Time: A Connectionist Approach," a new approach inspired by connectionism is described. My sense is that this approach is only connectionist in the broadest sense of using a large number of simple elements and using iterative techniques to converge to a minimum energy state. There is no learning involved and no attempt to model any real or imagined neural system. The basic idea is to adjust note inter-onset times (the system input) until they converge to exact multiples of some underlying pulse. This work is extended to form an on-line tempo tracker that fits a tempo curve (there is that "harmful" concept popping up again!) to incoming performance information.
In addition to their own work, there is a careful description and analysis of a quantizer by Longuet-Higgins. A fascinating observation in this work is the idea that quantizers have implicit but measurable expectations of when note onsets will occur. This is one way that quantizers can be compared and perhaps measurements of human expectations may lead to future improvements in machine quantizers. A remaining problem with this entire area of work is the difficulty of measuring performance. There are no standard benchmarks, so it is difficult to judge whether a system really works or to calibrate claims of success.
Overall, this book contains a wealth of diverse material. While few may read it cover-to-cover, there are important chapters on music representation, good methodologies for music perception research, and interesting models of perception suggesting further research.