Report on the International Workshop on Models and Representations of Musical Signals 5-7 October 1992, Capri, Italy

Henkjan Honing

In the three day International Workshop on Models and Representations of Musical Signals papers were presented on physical modeling, non-linear dynamic systems for modeling and composition, sound editors using graphical representations, higher level representations of musical signals and their structure, but also papers on pitch/tone recognition and perception, and even the optical reading of scores.

The two main issues that, in my opinion, emerged from this wide scope of research interests was the problem of control and the choice of the proper representations, each motivated from both musical and perceptual standpoints. With respect to the first, several papers showed that a powerful model of sound is not enough, the control of it is least as important. Some models get part of this control for free, like in physical modeling, but they lack still lack high level musical control, others, like some of the non-linear models, lack perceptual control but are often preferred because of their compositional attractivity. In the papers concentrating on the compositional aspects of sound modeling, like Di Scipio, Manzolli and M. Serra, the aim is freedom and flexibility, while still having compositional control: a balance often difficult to find. Although these and other papers made the distinction between what was called the micro and macro level of musical signals, the intermediate levels were ignored. These are the levels were much musical and perceptual information resides. E.g. the surface level at which one can listen to music, the accents or changes in articulation, the things that happen between notes, how transitions are made, the small modulations of pitch and amplitude that group or differentiate sounds, etc. Some papers mentioned this level. For instance, X. Serra described his recent work and extensions to his Spectral Modeling Synthesis system, a musically powerful combination of modeling time-varying spectra with a deterministic part (additive synthesis) and a stochastic part (filtered noise), and is now mainly concerned with exactly this musical control. But also the instrument level is often ignored. What makes a group of sounds being identified as one class or family, originating from a single sound source? A violin always sounds like a violin, independent of what and how you play on it (just as a FM module always sounds like a FM module). This is a question that is partly answered in the research in psychical modeling, on which Paladin & Rocchesso, and Smith presented papers. The first two authors described their recent work on making generalizations and optimizations in the field of physical modeling, by proposing an exicitator (non-linear dynamic system) and a one-dimensional resonator (linear dynamic system) as their basic building blocks for instrument building. But again the control of the instrument parameters was done by hand, and not part of the model. In that respect I was a bit surpirsed by Smith's statement at the end of his invited lecture, saying that he could, with just a few years work, successfully pass the acid test, by having an audience judge the difference between his artificial physical model or a real acoustic instrument ("Just give me a grant, and I'll be able to do it"). Iºm happy to believe him, but I missed some explanation on what are the problems left and how he thinks he can solve them. There still seems to be so much work left to be done in terms of the perceptual and musical aspects of control and the tools needed to represent and model them. Afterwards I was pleased to read in the proceedings that he does acknowledges the control problem saying that "musical control of digital musical instruments is still in its infancy." This running theme of control also reigned at the third day on Representations of Musical Signals. Garnett gave a presentation that started out as a studio report of work done at CNMAT. But things became more alive when he talked about his own work on controllers for conducting. The audience immediately opened up with questions and the talk ended in a lively discussion on what kinds of control were and were not possible (using a trained neurode network to categorize, and in performance recognize a set of gestures). In general the twenty minutes lectures (forty-five minutes for invited speakers) allowed for little improvisation or communication of general insights besides the paper itself. A workshop atmosphere, though, should take away the fear of showing some preliminary results and detail (instead of polished demo's); In a workshop one can afford to talk about not (yet) solved problems, instead of solutions. Dannenberg did exactly that, by looking back on his and related work by other researchers on composition systems. Taking a step back and analysing some of the practical solutions and describing their pro's and con's. Learning from these observations he now works towards an integration of discrete event-based representations for music (e.g., found in the Max programming language) and a continuous signal-based approach (e.g., in Music V type languages). He focused on the control problems in those data-flow languages. Roads gave an overview of his "organized sound" ideas and the wide-range of hardly compatible (to put it mildly) soft- and hardware he uses in his compositional work. The braveness of some composers to dive into all these different packages is enormous. But most of all it showed someone liking sound very much -he was one of the few presenters that didn't talk while he played sound examples.

The other issue was that of what representation to choose. What representations are musically useful and which are perceptually or psycho-psychically relevant? Hermes gave a talk concentrating on the differences between speech and music when looking at intonation. Supported by a number of experiments he stressed the importance of choosing the proper scales and representation when examining data in speech or music signals (a ERB scale for pitch in speech, somewhere between a linear and logarithmic scale, and a logarithmic scale for pitch in music). This general point returned in the talks by Mont-Reynaud, Slaney, and in a sense Eckel. All three of them use related auditory representations, be it a spectrogram, a correlogram or cocheleogram with different scales on the vertical and horizontal axes. Each representation was motivated by their different interests (e.g. sound separation in Slaney's case, or music and sound editors in Mont-Reynaud and Eckel's case), but it showed nicely that by choosing the right representation some looked-for characteristics fall out automatically. Mont-Reynaud gave a beautiful example of sound morphology in choosing a spectral representation with on the vertical axis log frequency and on the horizontal axis periodicity, causing e.g. a vibrato in the spectrum to stick out shifting back and forth along a fixed diagonal. Representation is an extremely useful tool in the understanding the complex processes that go on in auditory perception. Next to auditory representations, music representation was covered in of papers by Camurri & Leman, Dannenberg, Desain & Honing, Pope, and others.

Readers that can't wait for the book that will assemble some of the papers presented at this workshop, are referred to the proceedings published by the University of Napels, Dipartimenti di Scienze Fisiche, or to the book that was published by MIT press as a result of the previous workshop: "Representations of Musical Signals" edited by Di Poli, Piccialli and Roads.