Peter Desain & Henkjan Honing
[Published as: Desain, P., & Honing, H. (1992). Musical machines: can there be? are we? Some observations on- and a possible approach to- the computational modelling of music cognition. In C. Auxiette, C. Drake, & C. Gerard (eds.), Proceedings of the Fourth International Workshop on Rhythm Perception and Production. 129-140. Bourges.]
A response is given to questions that arose at the fourth Rhythm Perception and Production Workshop's general discussions (held in Bourges, France in June, 1992). The answers are presented in the form of some general observations and a description of the authors approach to the study and modelling of music cognition.
In the fourth workshop on rhythm perception and production, held in Bourges in June 1992, many interesting questions came forth in the discussions. These questions often addressed methodological issues. To give some examples, Gert ten Hoopen criticised "his fellow Dutchmen" (the authors of this paper) for their preoccupation with theories and elegant models, instead of taking a more empirical approach. In contrast, Piet van Wieringen remarked that the kind of models most workshop participants seem to be pursuing are of a 'data-driven' kind -fitting lines through experimental results-, of a descriptive nature instead of an explanatory one. Another issue that was discussed is the definition of expression. Wolfgang Prince posed the problem that the notion of 'expression' in music research often is not about expression at all, in the sense the word is used in other branches of psychology. In Neil Todd's opinion musical (e)motion or expression is closely linked to - and can be modelled by the laws of physics.
Trying to answers all these questions at once we decided to submit our 'Credo' for the proceedings of the workshop, as a description of our 'belief system' with regard to modelling music cognition. We will take the liberty to explore some intuitive insights and to ventilate speculative ideas. We will also sketch the research paradigm that underlies this work -and deal with some controversies within that paradigm. The material is a slight adaptation from the introduction of Desain & Honing (1992), which in its turn is a combination of the introductions and conclusions from both Peter Desain's PhD Thesis "Structure and Expressive Timing in Music Performance" and Henkjan Honing's PhD thesis "Music and the Representation of Structure: From Issues to Microworlds." These thoughts, arising sometimes in discussions with colleagues, motivated us in the present research and form the inspiration for future work. They can be read as our answers to the questions mentioned. We will try to communicate the advantages of neat and elegant computational models. The different kinds of models and systems used in AI will be posed as an alternative to a more 'data-driven' approach. A workable definition of the notion of expression in music will be elaborated. And our opposition against the use of physical metaphor in those issues will be stated.
The research we are engaged in is essentially multi-disciplinary. It is based on the study of music, the study of mind and the study of machine. Nowadays each of these topics is linked to the other in various research disciplines. In computer music, ways to design machines to make music are explored. In music cognition, mental processes that perceive and apprehend music are investigated. In artificial intelligence the mind is approached as a machine - and machines are built to learn more about mind. Although often the work focuses in on a narrow topic, the research in these broad domains forms the groundwork on which it is possible to base our contributions.
The modern cognitive and computational approach to the musical mind is quite different from the older psychology of music in that it develops formalised, testable models of aspects of the musical mind instead of intuitive, metaphorical concepts. Nevertheless there is an old metaphorical approach to the musical mind that has reappeared recently and that is, in our opinion, dangerous and misleading. Before presenting an alternative we will explain this theory. It is based on an apparent similarity between musical and physical motion. Helmholz was already quite explicit in his appreciation of the similarities, and even attributes to them a central role in the evocation of emotion: "[] it becomes possible for motion in music to imitate the peculiar characteristics of motive forces in space, that is, to form an image of the various impulses and forces which lie at the root of motion. And on this, as I believe, essentially depends the power of music to picture emotion." (Helmholz, quoted in Todd, 1992)
But he still pictures music that chooses to resemble physical motion - not in any magical way forced to do so. Todd starts to blur the distinctions involved. For example, it seems intuitively appealing that by increasing the energy the maximum velocity or tempo increases (Todd, 1992). Every musician knows that the faster the tempo the more work they have to do. It may be that the 'energy' is also salient to the listener which could make a contribution to affect (Todd, 1989; p 156). In our view, to equate the physical notion of energy with the concept of experienced musical energy or "perceived musical motion" has to be rejected, in the sense that the laws of physics serve not only as a description but also as an explanation.
However appealing these similarity-based theories might be to laymen, there is no evidence whatsoever that a walnut is good for the brain because it looks like one (to name a theory built on the same foundations). We do not object to the testing of the possible use of square root functions to model musical ritards, or the use of constant acceleration of musical velocity in modelling expressive timing - just because both ideas happen to describe physics of falling objects under constant gravity as well. However, we do object to the idea that physical motion is more than a mere metaphor in these matters. Some authors move from the simplicity of falling physical bodies to moving bodies of human performance and explain the similarity as embodiment of musical thought (Clarke, 1992; Davidson, 1991), and thus propose a healthy alternative to the wholly mentalistic approach AI researchers tend to take. Furthermore this approach is again open for scientific inquiry and our criticism is not aimed in that direction.
To understand our objections to the use of metaphor it is important to note that in the search for simple similarities, alternative explanations of the phenomena are easily overlooked, as is shown by some studies of the final ritard. This large deceleration at the end of a piece is often observed to have a certain form (a square root curve) and it can indeed be modelled as the speed of a mass under a constant deceleration, a constant braking force (Kronman & Sundberg, 1987).
However, there is another explanation possible that is based on the structure of the music and the architecture of temporal perception itself. For a large tempo change such as a final ritard to be still perceivable as a slowing down, it should not slow down too fast, otherwise the rhythmic categories will not be communicated intact and tempo cannot be perceived. Any quantizing and tempo tracking model like the models proposed by Longuet-Higgins (1976) and Desain & Honing (1991) will predict a form of maximum deceleration rate that can still be followed. It might be that a good model can indeed predict - by its limitations - the limits of acceptable rubato and final ritard. Because of the nature of these models they will also predict that a) these limits are different for various rhythmic structures and b) the slowing down might well be required to work in stepwise fashion - because the models propose separate tempo tracking mechanisms on different levels of the metrical hierarchy. The predictions are consistent with findings that music from different composers or style periods require different final ritards to work well musically, and the evidence that in final ritards there is indeed a tendency to decrease tempo in a stepwise manner (Clynes, 1987). Such observations immediately show the importance of investigating this possible explanation further. If it can be shown to hold, it will be a much more attractive explanation than the physical motion theory because it explains properties of good music performance directly from the musical material and from the perceptual processes themselves. We are convinced that music is based on, plays on, and makes use of the architecture of our perceptual systems much more than that it imitates our physical surroundings.
Music never functions in a vacuum: it is carried by pressure waves in air. It can be studied as a sound signal that is emitted by a performer, travels through the air and is picked up by the listener's ears. This is the domain of acoustics, (the behaviour of the sound in space) and of psycho-acoustics (the conversion of pressure waves into musical percepts in the ear and in parts of the brain) which form complex fields of study in their own right. Also music never functions in a social and cultural vacuum. The perception of music can be changed in subtle ways by the visual impression of the performer, sociological factors, fashion, the listener's associations and a multitude of other factors studied in sociology, anthropology and ethnomusicology. All these presuppose the human ability to remember and recognise musical fragments and even a specific style, composer or performer, high level tasks investigated in psychology. Musical styles and periods can also be explored in their own right, independent of cognition, as is done in musicology. In our work all these issues, however interesting, are ignored.
But what is this research about, if so much is excluded? A large portion of it deals with early rhythm perception mechanisms that process the perceived time intervals roughly corresponding to a note, and infer rhythmic categories, a sense of tempo and local deviations thereof and expectations for the events to come. All the caveats are necessary because even the simplest questions about the form, role and function of expressive timing and tempo are quite difficult to tackle. That is also one of the reasons why we approach the subject in a rather technical way - based on simple elapsed time intervals between performed note onsets.
One can question whether with such restrictions to simple measurable quantities, one can still study meaningful matters. Or does the scientific and technological approach kill the magic of music? In a sense it does, and in a sense it does not. It does kill magic by refusing to assume any direct communication of human emotion from performer to listener by wizardry. However, by assuming that, if such things are communicated at all, they must be communicated via the music signal itself, it takes the unravelling of this signal as its primary goal. Technology then becomes very helpful, and the discovery of subtle and intricate patterns of musical performance, the almost unbelievable consistency in fraction-of-second timing of human performers and the delicate ways in which musical structure is communicated exposes a great wealth of wonder and magic. Besides, music always has been filled with techniques and technology, aiming towards the mastery of it, be it in instrument building, composition or control over the instrument in performance.
This reliance on objective measurements does not give researchers the right to dismiss other realities. When a music teacher teaches a student how to play e.g. 'sadly' certainly something is conveyed. And the fact that it might be difficult to discover this sadness in the measured musical signal does not mean it is not there. More often than not performers know very well what happens in the music, even if they state it in unobservable terms; it is the researcher's task to make sense of it in objective ways.
Once some progress is made in that direction, what happens in music can still only be described - not prescribed. More than once researchers have made serious mistakes in this respect - to the point of ridiculous circularity, like Manfred Clynes (1987) who claims that a performer who does not play according to his theory of the 'composers pulse' does not play well.
Artificial Intelligence research (AI for short) has always had two faces, a technological and a cognitive one. The first solely strives to design technical systems (machines) that behave intelligently and the latter seeks to make testable, formal models of intelligent human behaviour. Our own orientation towards Artificial Intelligence is mainly motivated by the possible understanding of human cognition that it might bring. Curiosity about the musical mind is the main driving force and practical applications e.g. for music production that may result from the research are considered a by-product. The occurrence of the word system or model in publications gives one a fair guess about the approach a researcher takes. A problem arises when the terms (and the orientations) are confused: a successful system only has to behave up to input-output specification but the internal mechanisms of a successful model are supposed to tell us something about reality. In the initial stages of the research this issue is sometimes necessarily unclear, but finally, if one is to learn something about human intelligence, it must be made explicit how the model relates to the phenomenon modelled. This testing of artificial intelligence models, or even stating the models in ways that generate testable predictions, is a field that is barely developed. There is a huge gap between experimental psychology, with its sophisticated tools for testing simple processes, and cognitive science which has hardly any tools for testing their more complex models. Even the way in which computational models can be described in order to clarify what it is that is modelled, and what is simple implementation detail, is problematic, especially since programming languages do not support the specification of those issues. However, an adequate programming style, a clear description of these issues and the publication of the program in the form of a micro version helps in determining the value of an algorithm as a model for a cognitive function. Artificial Intelligence at its worst can be seen in articles that make anthropomorphic claims about huge unpublished programs. The laborious so-called rational reconstruction of those programs by others to check the claims is then the only remaining route to scientific progress (Ritchie & Hanna, 1990). The microworld approach is in our opinion a far better methodology. A great deal can be said about advantages, disadvantages, and implications of building microworlds and micro-version programs as a methodology. Later we will explain some of the explicit and implied characteristics of this methodology.
Perhaps contrary to common usage, the word expression as we employ it does not denote what music expresses to the individual listener. All the links to musical affect, to emotion and even to a esthetics are considered too complex to tackle before more mundane issues are understood. Expression is assumed to be a syntactical concept - dealing only with the form of the music. In the first stages of the research expression was defined as the pattern of deviations of attributes of performed notes from their value notated in a score. Everything added by the performer to the score, all deviations from a strict mechanical performance, was termed expression. This definition, however useful in the initial study, soon lost its attractiveness. In general listeners can appreciate expression in music performance without knowing the score, and a full reconstruction of the score in the form of a mental representation is impossible. Take for instance the notion of the loudness of notes. Should a listener be required to fully reconstruct the dynamic markings in the score before it is possible to appreciate the deviations from this norm as expressive information added by the performer? Such a nonsensical conjecture indeed follows from a rigid definition of expression as deviation from the score. Seashore was a bit more careful (albeit somewhat vaguer) when he defined expression, independent of a score, as:
"Artistic deviation from the fixed and regular: from rigid pitch, uniform intensity, fixed rhythm, pure tone." (Seashore, 1938)
It is possible to find more elaborate ways of defining expression on the basis of performance information only. In later stages of the research this was achieved by basing expression on the notion of structural units, using this working definition: expression within a unit is the pattern of deviations of its parts with respect to the norm set by the unit itself. Take e.g. a metrical hierarchy of bars and beats. The expressive tempo within a bar can be defined as the pattern of deviations of the tempo of each beat from the tempo of the bar. Or take the loudness of the individual notes of a chord. The dynamic expression within a chord can be defined as the set of deviations of the loudness of the individual notes from the mean loudness of the chord. Using this definition, expression can be extracted from the performance data itself, taking more global measurements as reference for local ones, based on the concept of known units. Thus the structural description of the piece becomes central, both to establish the units which will act as a reference and to determine the sub-units that will act as atomic parts whose internal details will be ignored. A similar definition works well for the expression carried by the difference of two voices or formed by the difference between e.g. a theme and a variation. Accepting this intimate link between expression and structure - or rather the foundation of the concept of expression on structural units - the nature of the structural description becomes a crucial concern. Structure in music is not a simple concept, because of the multitude of structural descriptions in use. Let us start with hierarchical structures like metre, rhythmic grouping and phrasing in which the structural links are part-of relations. These overlaying structural analyses, concerned with different aspects of the piece, may violate each other's boundaries - like a phrase ending in the middle of a measure. There can be ambiguity: multiple mutually exclusive analyses or interpretations of the same aspects of a piece. There may be a local violation of otherwise hierarchical structure, like two overlapping phrases (a situation seldom encountered in linguistics). The need for local structural relations like grace notes and other ornaments is obvious too. These can be described by a part hierarchy, but there are also structural relations that cannot be treated likewise, like symmetrical associations between recurrent motives. Besides these collections of musical events, and the simple relations between them, we need a formalisation of the various rhythmic, melodic and harmonic roles that can be ascribed to such collections. We think that the complexity sketched mirrors the complexity found in the expressive signal itself, since the various structures are the source of expression and are conveyed to the listener by that means.
The recent work of Longuet-Higgins & Lisle (1989), Todd (1989) and Drake & Palmer (in preparation) indicate that there might be ways in which the communication of structure by expressive timing can be formalised. However, no clear picture that deals in a unified way with all kinds of structure (metrical, rhythmic, phrase, local surface etc.) has emerged yet. It may be that this is due to our lack of understanding of the general knowledge representation issues involved.
The task of constructing a general representation of music is hard to imagine and to plan, especially since projects of a comparable complexity have not been very successful. We still lack a general theory of representation: "a sobering fact since our systems rest on it so fundamentally" (Smith, 1991). General representation languages are still under development, and there are, besides lots of technical difficulties, still theoretical and philosophical problems of enormous proportions. We nevertheless think that it is very important to look for generalisations and abstractions in the representation of music in all its aspects. An alternative position is summarised in the statement "A representation depends on its use" (Roads, 1984; Pope, 1988; Huron,1990b), a viewpoint described by Christopher Longuet-Higgins (1990) in the following quote:
"My only comment is to remark that the quality of a representation depends on how well it fulfils the purposes for which it is intended, and to underline the need to specify exactly what these purposes are, and how the representation is to be used in achieving them. A blindingly obvious, but by no means trivial, example is the remarkable efficiency of stave notation for the purpose of sight-reading - a form of representation from which we still have a great deal to learn"
Although this might be a valid approach to several application domains of music representation -whether it is in music notation, printing, archiving, or the construction of sound and sequencer files formats-, the aim of constructing a general representation is to bring out the generalisations and abstractions that are not primarily influenced or guided by their use. We prefer this path 'generalisation and abstraction' to that of 'dedication and specialisation' (i.e. to design a new and therefore "efficient" representation for every new task or problem) and that forces one to describe what is shared among all these representations.
The idea that in music "everything has to do with everything", and the impossibility to describe aspects of it in isolation, finds a lot of support in ethno-musicological research. Yet we think that the perceptual aspects of music as a whole can be effectively understood by describing them in a formalised way, ignoring a larger context. The way in which the study of music should be approached and how the possible approaches (e.g. social, perceptual, historical) should be compared is essential; a universal representation of music is impossible and the pursuit of it should be rejected (Huron, 1990a). Such a universal representation will have "worldly proportions, [...] will change music in unpredictable ways, and there is no neutral point of view from which to begin." And, indeed, a universal representation of music seems impossible. Therefore we prefer to use the term 'general'. The definition of general is important here, since it makes significant restrictions. With 'general' we mean firstly, a representation that describes the measurable and perceptual aspects of music (i.e. a sound signal) and secondly, the cognitive aspects that are directly involved with this perception. The latter is a bit of a problem. The term 'cognitive' refers to models or systems that contain and process knowledge. But are there any limits on the knowledge we need for our 'general' representational system? We have to be able to restrict the required knowledge. This brings us to the "frame problem" (McCarthy & Hayes, 1981): a problem that arises when knowledge has to be encapsulated, separated from the rest of the world knowledge. It is difficult, and most of the time even impossible, to determine what knowledge is affected and what knowledge is unaffected by a certain change or addition of new facts to a knowledge base. If we think of a microworld as a small knowledge base, the possibility to extend and combine microworlds can be questioned. Jerry Fodor takes an important stand in this (Fodor, 1983). He doubts the possibility of formalising cognitive processes, as they are part of one central system that is global, non-modular, and therefore cannot - with our current theoretical tools and methods - be comprehended, and can therefore not be formalised. He considers this lack of understanding as the basis of a failure in formalising cognitive processes: "cognitive science has not even started". According to him the cognitive sciences can be and are successful in formalising the modular parts of the mind: the input systems that are "cognitively impenetrable" (like the five senses and language). These are a successful domain for AI and psychological research. The problem now becomes whether music can be considered as being part of this central system, or whether it is a module on its own? It clearly is part of the former if one takes into account all the social and cultural aspects of music; music can be a cognitive faculty among many of other things. Restricting a representation of music to, firstly, all the information measurable in the sound signal itself and secondly, by the cognitive processes that directly interact with it, seems limited enough to gain some level of success (following Fodor's argument). Within this definition we think it is possible to work towards generalisations that can form a basis of cognitive models of important aspects of music. A positive consequence of such a 'decontextualized' representation is its effectiveness in a carefully restricted domain where almost all the knowledge is special to that domain (i.e. little or no common-sense knowledge is required). The question, whether music cognition can be described as such a consistent and restricted domain, is still open.
Our experience with microworlds started as an approach to the design of composition systems, influenced by the work of the Logo community (Desain & Honing, 1988). It developed over the years into a methodology that accompanied us in different areas of music and AI research. In this approach one concentrates on the construction of a small and closed set of procedures and data structures. In this exploratory microworld it is easy to experiment with ideas, vague as they are, to gain more insight into the problem to be understood and modelled. Secondly, we constructed theories in computational form. These new microworlds made the theory explicit and allowed for tests on completeness and internal consistency. Thirdly, we found much profit in (re)constructing 'micro-versions' of larger programs (or models), particularly when they were made to share the same data abstraction. In trimming these computational theories down to a "bare minimum", they allowed for better and easier comparison, bringing a real understanding of the theory with, more than once, the emergence of more abstract or general notions as a result. The concept of a microworld is closely linked to an era:
"What characterizes the period of the early seventies is the concept of a microworld - a domain which can be analysed in isolation." (Dreyfus, 1981)
Many of the microworld ideas stem from the Logo project (Papert, 1980) and other people working at MIT in the seventies (e.g. Abelson, Minsky, Winograd, Sussman). The notion of a 'microworld' has been described by Marvin Minsky and Seymour Papert as:
"Each model - or 'micro-world' as we shall call it - is very schematic; it talks about a fairyland in which things are so simplified that almost every statement about them would be literally false if asserted about in the real world. [...] Nevertheless, we feel that they [the micro-worlds] are so important that we are assigning a large portion of our effort toward developing a collection of these micro-worlds and finding how to use the suggestive and predictive powers of the models without being overcome by their incompatibility with literal truth." (internal MIT memo Minsky & Papert, 1970; quoted in Dreyfus, 1981)
Papert and his colleagues developed several microworlds for use in an educational context, inspired by the cognitive development theory of Jean Piaget (Papert, 1980). These microworlds were designed to facilitate learning, and were based on a new programming language called Logo (based on Lisp) embodying the educational philosophy of "learning without being taught." The most prominent example of one of these microworlds is the 'turtle-world' which models a world of turtle-geometry (Abelson & diSessa, 1980). Children learned about this world by giving commands to a turtle robot, or a turtle image on a computer screen, and building procedures from them. They gained knowledge and understanding of (turtle) geometry by just exploring the possibilities of this object. These ideas had a major influence on the development of educational research and formed the basis of a widespread curriculum in computer science in primary and secondary schools.
Another, often referred to, example of the microworld notion is Winograd's block-world for natural language understanding (Winograd, 1972). Here, by contrast, the domain is not central; the microworld just serves as a toy problem to test the possibilities of a certain approach to natural language processing. This kind of microworld approach, and the optimism that these microworlds could simply be combined and extended into a general knowledge representation, prompted a heavy critique (e.g. Dreyfus, 1981) that gave the notion of microworlds a bad press (causing Winograd to take the side of his critics, see Winograd & Flores, 1987). This critique, though, should be placed in the perspective of using microworlds to model human knowledge, instead of seeing them as part of a methodology that brings out the isolated problem under study and make it explicit (in the case of Winograd's microworld, the representation and processing of natural language). There are still strong arguments for the design and use of microworlds. Because, besides these ideas normally associated with microworlds (i.e. to model a toy problem, or to facilitate "learning without being taught"), there are much broader implications. They make it a valid and important methodology in computational modelling and artificial intelligence research. It is based on the observation that:
"The best way of finding out the difficulties of doing something is to try to do it." (David Marr, 1985; p. 108)
This near-clich» has, as every clich», the reality of the obvious. But the quote is a good illustration of an important characteristic of AI research that stands for trying out ideas in the form of programs. Vague formalisms, parts of theories, and "poorly understood and sloppily formulated ideas" (as Marvin Minsky calls them) come up against a tough discipline in programming. Minsky promoted "exploratory programming" to avoid having to start with a complete and detailed specification: "an excessive preoccupation with formalism is impeding our development" (Minsky, 1987). This exploratory programming (using microworlds) was one of the key concepts in the beginning of AI in the early seventies, a newly emerging methodology, and an alternative to empirical research.
In our own work we have frequently found that actually programming a certain idea can provide new insights. It brings out other aspects of a possible solution because the program forces you to answer questions you didn't think of, or it suggests a way of programming it in another way (e.g. choosing a different data abstraction or control mechanism). A microworld, because of its relatively small dimensions, invites you to do things "completely differently" because not all the work (as in a larger system) is dependent on the abstractions chosen. Experimenting with the resulting ad hoc formalisation or program may bring out further insights, providing a real understanding, that, in turn, possibly provides for a new formalisation, and a new theory. In making problems concrete, deciding what is essential and what isn't, and moving knowledge and understanding from being implicit (e.g. in the control structure) to being explicit (e.g. as data structures), problems become objects, objects of thought, that facilitate thinking about them - just as the turtle gave children "an object to think with" (Papert, 1980), helping them to understand more about geometry. After this first exploratory phase of constructing and using a microworld, hopefully the understanding of the problem domain is improved. The next stage can then be to construct a theory. Is a microworld a theory? A computational version of the theory in the form of a microworld has a number of advantages. After this formalisation we can recapture the implications of the theory, and in the process better understand how to achieve abstractions and true generalisations. To build and to use such a microworld formalisation brings out aspects never foreseen during the design of a theory. It makes the theory concrete and verifiable. The construction process itself may even influence its design by revealing flaws and missing aspects. As such, a microworld is more than a theory.
But there are also some dangerous aspects that can be associated with the construction of programs or microworlds. One frequently sees, in a computational approach to music, that a class of problems is described, followed by a description of a program and a description of the results obtained from sample problems. Often this is just one of a small set of problems with an unclear relation to this class of problems the program or the methods embody. This is the well-known "bulky program - is in fact the theory"- mistake. It is unclear what the limitations are, which aspects are generalisations, which aspects are specific to a particular problem, and which can be attributed to a whole class. If these limitations are not stated along with the program, the program is more or less a 'black box'; it works in a particular case, but we don't know precisely why, and even more important, we have no idea when it wouldn't work. There is a danger of starting to "live" in the self-created microworld, rigorously explaining all other problems in terms of this world, instead of retaining flexibility and awareness of a certain set of un-treated problems. As such, a microworld is far from a theory.
The microworld methodology might come out even more clearly when we compare it to the expert system approach that seems to be its antithesis. Expert systems or rule-based systems accumulate knowledge in the form of a relatively large collection of rules. They describe explicitly what to do in a large collection of specific cases. Extra rules are added to model certain interaction or take care of unwanted interactions (this approach is often negatively called 'patchwork rationalism'). Rule-based systems can be effective when applied to a very restricted domain containing a relatively small collection of rules. As such, they are the opposite of microworlds, because they are supposed to embody the near-completeness of knowledge with respect to a certain domain (obtained by an over-specification of rules) A rule-based system is capable of reasoning in cases where human beings cannot oversee the consequences any more (e.g. medical, law and nuclear power advice or expert systems). Microworlds, on the contrary, are designed such that human beings can oversee the consequences. Furthermore, where the exploratory microworld serves to find all possibilities and consequences of a certain micro-theory, a rule-based system is supposed to describe all these possibilities and consequences, though contained implicitly in a large collection of rules and their interactions.
It is an illusion that rules can be added endlessly to a rule-based system. Adding knowledge is difficult, if not impossible, because the dependencies of the rules and their interaction is often ill-defined, if not plain unknown, and adding a rule will have an unforeseen effect on the behaviour of the system. The rules are difficult to separate with regard to the part of the domain knowledge they represent. Pragmatics seem to prevail in the design of these systems: if they work reasonably well, it is alright. Because of these characteristics an expert system is again the inverse of a microworld with isolation and modularity as its intrinsic quality.
It is therefore quite peculiar that there is still a body of research that has full faith in this expert system methodology (Friberg, Fryd»n, Bodin & Sundberg, 1991), even up to the level of being able to represent the sum total of human knowledge (Lenat & Feigenbaum, 1991 see also critique by Smith, 1991).
Artificial Intelligence is not a very homogeneous domain. At present there is within AI a clash of two competing paradigms: Connectionism and the Symbolic paradigm. Since in our own research the use of connectionist and/or symbolic representations is a recurrent theme, it is good to dwell on both a bit more. The so called Good Old Fashioned Artificial Intelligence, (i.e. the symbolic approach) has been established firmly as a research methodology in the past decades. The methods and tools it uses are symbolic, highly structured representations of domain knowledge and transformations of these representations by means of formally stated rules. At the heart of this methodology is the use of symbols that have no content in themselves: information processing is of a syntactic nature. It is easy to misinterpret such a system's behaviour since the symbols often carry suggestive name tags that may seduce one into attributing more sense, more intelligence, to the program than is actually implemented in the rules themselves. One has to realise that some of these clever programs actually do not achieve very much, being based on a very smartly developed knowledge representation that solved, or evaded the problem beforehand. Any extension of these systems, however small, or any attempt to generalise the results is doomed to fail because the knowledge representation is designed just to give ad-hoc solutions to a small set of problems. We feel that this approach carries the symbolic approach to its ridiculous extreme.
However, other kinds of symbolic AI have contributed more or less generalisable theories to the field, and have proposed models of human information processing. These rule-based theories can function as abstract formal descriptions of aspects of cognition. Some authors even go beyond that and claim that mental processes are symbolic operations performed by mental representations of rules.
Until the connectionist paradigm emerged there was no real alternative to this view. In the new paradigm the departure from a reliance on explicit mental representation of rules is central, and its approach to cognition is fundamentally different. Connectionism offers the possibility of defining models which have characteristics that are hard to achieve in traditional AI , in particular robustness, flexibility and the possibility of learning. The connectionist boom has produced lots of interesting work, although many researchers have lost their critical attitude impressed as they were by the good performance of some prototypical models. This has resulted in thousands of papers presenting more and more examples of problems that could be learned by a neural net, the proof being the simulation of such. Levelt, bothered by this waste of effort, concluded that connectionist models are, mutatis mutandis, as handy as a city map on a 1:1 scale (Levelt, 1989). Indeed more study is needed of the limitations of these models. A connectionist model that 'works' well constitutes in itself no scientific progress, if questions like the scalability to larger problems and the dependency of the model on a specific input representation cannot be answered. The theoretical observation that a connectionist system can simulate any symbolic computation machine (is 'Turing machine equivalent') and vice versa tends to dismiss the relation between the paradigms as a non-issue. We think that the language in which problems are stated and the level at which research is conducted is of major importance - each language obscures some matters while clarifying others. A related idea about the relation between the paradigms is the presentation of connectionism as an implementation level theory, which can coexist with a more abstract symbolic theory on a higher level. This view is often associated with the claim that connectionism is superior to the symbolic approach because the computational units resemble cells found in the brain (the term 'neural networks' stems from that postulated isomorphism). This has to be rejected firmly. The simple computational method used in connectionism is miles away from true biological modelling and the chosen computational level of abstraction can never be a ground for superiority.
Against the background of this debate within AI and cognitive science on the role of connectionist models, some researchers have concentrated on a technical examination of the weak and strong points of both symbolic and neuro-computing, to be able to combine them in so-called hybrid systems. They claim that because symbolic computing is best suited for higher level functions such as reasoning and planning, and neuro-computing is more applicable to low-level, perceptual and classification tasks, systems containing modules from both paradigms should be devised. This approach is not free of problems, to put it mildly. At its worst it can be described as: "we do not understand how neural nets work, we do not know how rule-based systems work, let's combine them and see what happens." Such a pragmatic approach can only obscure the real issues.
There is, however, another way to deal with the challenges of connectionist work. By comparing both paradigms one quickly discovers that the formalisms used are often of such an idiosyncratic nature that it is impossible to make claims about the behaviour of models from both paradigms. Concentrating on general abstract descriptions of behaviour then becomes a very fruitful activity. It yields new ways to look at the connectionist and the symbolic models and to characterise them further - a positive contribution in itself. For example, consider the benefits of describing the input-, state- and solution spaces, a trivial exercise for connectionist systems. In symbolic systems these constructs often remain hidden in the program code and are not made explicit in the articles, but they can help enormously in characterising such systems. These analyses also yield ways to describe connectionist systems on a more general level than simulation runs can. One such point that is often neglected is the representation:
[for most aspects of connectionist modelling] there exists considerable formal literature analyzing the problem and offering solutions. There is one glaring exception: the representation problem. This is a crucial component, for a poor representation will often doom the model to failure, and an excessively generous representation may essentially solve the problem in advance. Representation is particularly critical to understanding the relation between connectionist and symbolic computation, for the representation often embodies most of the relation between a symbolically characterized problem (e.g. a linguistic task) and a connectionist solution. (Smolensky, 1990)
In representation issues the symbolic paradigm has, because of its very nature, much to offer to connectionism. We think a combined study of both paradigms might overcome the controversy. In the end the differences may turn out not to be that essential. One example supporting this view is the research that showed that a certain kind of network can still support modularity and recursive (de)composition of constructs (Pollack, 1990) - a central issue in symbolic AI. However, at the moment we are still confronted with a new and hardly understood paradigm.
We expect further progress from the elaboration of continuous knowledge representations, the most eye catching feature of connectionism in comparison to the symbolic paradigm that uses discrete concepts (be it memory locations, categories, inference operations or production rules). Continuous learning curves are a sine qua non of multi-layer learning algorithms. And the behaviour of neural nets has been described, with great benefit, as continuous over time, making the whole apparatus of partial differential equations applicable. Simulation of such networks on computers is done by applying time-sampling as an approximation to the time-continuous change of state in a network. It might prove beneficial to carry this idea to its extreme. It is strange indeed that the discreteness of the individual network cells has not yet been considered to be a space-sampling of a basically space-continuous computing model. The input, output and state space of the system then become function spaces instead of a vector space. It might even be possible to consider the cell layers of a network as space-sampled, continuous, two dimensional computation. Instead of a network we can then metaphorically talk about a lump of "computing material." It might well be that the analytical methods available for systems of differential equations for continuous functions, and sampling theory, can thus again be applied to connectionist systems, and produce results for unanswered questions like the number of hidden layers or the number of cells in those layers needed for a certain task.
In general it appears that representations of a continuous nature can improve the flexibility of representational systems considerably. They sometimes yield a level of performance that is not obtained by their discrete counterparts. Continuity has been underrated for too long now, both from a technical viewpoint -in many cases considering a discrete representation a harmless simplification-, and from musicological and psychological perspectives which, more or less, over stressed the importance of discrete categories.
The title of this article is a paraphrase of the title of Terry Winograd's article "Thinking machines: Can there be? Are we?" (Winograd, 1990) describing the status and value of building 'thinking machines'. In our case, i.e. the modelling of music cognition, the answer to the questions in the title with regard to 'musical machines' could be a simple yes and no, respectively. But the relation between these two questions is far more interesting: what can be gained by describing and building aspects of music cognition as musical machines? This approach enables one to evaluate existing theories and compare them at a precise and reproducible level, to make the first steps towards a unification of some hypothesis and theories of rhythm perception, and gives an powerful alternative to the descriptive ad hoc theories based directly on data. Of course, there are major reductions in approaching the study of music with machines, but we hope to have shown what the advantages are.
Abelson, H. & A. diSessa (1980) Turtle Geometry: Computation as Medium for Exploring Mathematics. Cambridge, Mass.: MIT Press.
Clarke, E.F. (1992) Generativity, Mimesis and the Human Body in Music Performance. In Proceedings of the 1990 Music and the Cognitive Sciences Conference, edited by I. Cross and I. DeliÀge. Contemporary Music Review. London: Harwood Press. (forthcoming)
Clynes, M. (1987) What can a musician learn about music performance from newly discovered microstructure principles (PM and PAS)? in A. Gabrielson (Ed.) Action and Perception in Rhythm and Music, Royal Swedish Academy of Music, No. 55.
Davidson, J. (1991) The Perception of Expressive Movement in Music Performance. Ph.D. thesis, City University, London.
Desain, P. & H. Honing (1988) LOCO: A Composition Microworld in Logo. Computer Music Journal 12(3). Cambridge, Mass.: MIT Press.
Desain, P. & H. Honing (1991) Quantization of Musical Time: A Connectionist Approach. In Music and Connectionism, edited by P.M. Todd and D. G. Loy. Cambridge, Mass.: MIT Press.
Desain, P. & H. Honing (1992) Music, Mind and Machine, Studies in Computer Music, Music Cognition and Artificial Intelligence. Amsterdam: Thesis Publishers.
Drake, C & C. Palmer (in preparation) Recovering Structure from Expression in Music Performance. Manuscript..
Dreyfus, H. (1981) From Micro-Worlds to Knowledge Representation: AI at an Impasse. In Mind Design, edited by J. Haugeland. Cambridge, Mass.: MIT Press: 161-204.
Fodor, J. (1983) The Modularity of the Mind: An Essay on Faculty Psychology. Cambridge, Mass.: Bradford Books, MIT Press
Friberg, A, L. Fryd»n, L. Bodin & J. Sundberg (1991) Performance Rules for Computer-Controlled Contemporary Keyboard Music. Computer Music Journal, 15 (2).
Honing, H. (1992) Issues in the Representation of Time and Structure in Music. In Proceedings of the 1990 Music and the Cognitive Sciences Conference, edited by I. Cross en I. DeliÀge. Contemporary Music Review. London: Harwood Press. (forthcoming)
Huron, D. (1990a) Personal Communication. A letter commenting on Honing (1992), and in response to a letter by Honing commenting on Huron, 1990b.
Huron, D. (1990b) Design principles in computer-based music representation. In: Computer Representations and Models in Music, edited by A. Marsden and A. Pople. London: Academic Press.
Kronman, U. & J. Sundberg (1987) Is the Musical Retard an Allusion to Physical Motion? In A. Gabrielson(Ed.) Action and Perception in Rhythm and Music, Royal Swedish Academy of Music, No. 55.
Lenat, D. B. & E. A. Feigenbaum (1991) On the thresholds of knowledge. Artificial Intelligence. 47.
Levelt, W.J.M. (1989) De Connectionistische Mode, Symbolische en Subsymbolische modellen van Menselijk Gedrag. In C.Brown, P.Hagoort, T. Meijering (Eds.) Vensters op de Geest, Cognitie op het Snijvlak van Filosofie en Psychologie. Utrecht: Stichting Grafiet.
Longuet-Higgins, H.C. & E.R. Lisle (1989) Modeling Music Cognition. Contemporary Music Review 3(1). London: Harwood Press.
Longuet-Higgins, H.C. (1976) The Perception of Melodies. Nature 263.
Longuet-Higgins, H.C. (1990) Personal Communication. A letter commenting on Honing (1992).
Marr, D. (1985) Vision: the philosophy and the approach. In: Issues in Cognitive Modeling, edited by A. M. Aitkenhead and J. M. Slack. London: Lawrence Erlbaum Ass.
McCarthy. J. M. & P. J. Hayes (1981) Some philosophical problems from the standpoint of artificial intelligence. In: Readings in Artificial Intelligence. Palo Alto: Tioga Publishing.
Minsky, M. (1987) Form and Content in Computer Science. In: ACM Turing Award Lectures, edited by R. L. Ashenhurst & S. Graham. Reading, Mass. Addison-Wesley.
Papert, S. (1980) Mindstorms. New York: Basic books.
Pollack, J.B. (1990) Recursive Distributed Representations. Artificial Intelligence 46.
Pope, S. T. (1988) Music notations and the representation of musical structure and knowledge. Perspectives of New Music 24.
Ritchie, C.D. & F.K. Hanna (1990) AM: A Case Study in AI Methodology. In T. Partridge and Y.Wilks (Eds.), The foundations of artificial intelligence, a sourcebook. Cambridge: Cambridge University Press.
Roads, C. (1984) An overview of music representation. In: Musical Grammars and Computer Analysis, edited by M. Baroni and L. Callegari. Firenze: Olschki.
Seashore, C. E. (1938) Psychology of Music. New York: McGraw-Hill.
Smith, B. C. (1991) The owl and the electric encyclopedia. Artificial Intelligence. 47.
Smolensky, P (1990) Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems. Artificial Intelligence 46.
Todd, N.P. (1989) Computational Theory and Implementation of an Abstract Expression System: a Contribution to Computational Psychomusicology. PhD thesis, University of Exeter.
Todd, N.P. (1992) The dynamics of dynamics: A model of musical expression. Journal of the Acoustical Society of America.
Winograd, T. & F. Flores (1987) Understanding Computers and Cognition. A New Foundation for Design. Reading, Mass.: Addison-Wesley.
Winograd, T. (1972) Understanding Natural Language. New York: Academic Press
Winograd, T. (1990) Thinking machines: Can there be? Are we? In: The Foundations of Artificial Intelligence. A Source Book, edited by D. Partridge and Y. Wilks. Cambridge: Cambridge University Press.