The stimulating papers by Vijay Iyer and Aniruddh Patel seek to bridge the (ever-shrinking) gap between humanities and cognitive science, in order to tackle fundamental enigmas that have puzzled composers, performers, theorists, and listeners for centuries. To select just a few: What is the nature of the relationship between speech and melody? What is musical performance and how does it compare to the other sorts of actions we perform in our daily lives? What is the relationship between musical sound, our cognition of it, and the way that our bodies respond to it? Iyer and Patel propose solutions to these questions, supported by recent findings of controlled experiments and systematic statistical analyses. It would be redundant for me to summarize their findings here; instead, I will point to a few moments in the history of Western music theory when earlier writers asked these same questions and suggested similar solutions, albeit from a very different methods of inquiry.
Patel remarks that his study exploring the connection between instrumental music and the “rhythms and melodic patterns of speech” is based on “an intuition voiced by several musicologists and linguists over the last 50 years, namely that purely instrumental music can reflect the sound of a composer’s native language” (Patel 3). This intuition stretches back way further than only fifty years, however; it has a source in earliest Greek music theory. Aristoxenus begins his treatise Elementa Harmonica with reflections on the relationship between speech and song, lamenting that earlier writers paid insufficient attention to this vital topic:
First of all, the prospective student of melody must analyze the movement of the voice, its movement, that is, with respect to place [topos], for there is not just one variety of this movement. The voice moves in the kind of movement I have mentioned both when we speak and when we sing[…] but the two movements are not of the same form. Up to now no one has ever carefully defined what the distinguishing feature of each of them is[…] To understand them, it is necessary to discuss relaxation, tension, depth, height, and pitch, so as to say how they differ from one another[…] No one has said anything about these: people have grasped some of them not at all, others only confusedly. (Barker 127)
Although Aristoxenus recognizes that speech and song “differ” from one another – and that these differences are what deserve the theorist’s attention – both are essentially and inextricably linked as divergent forms of that parent medium of human expression: the voice. Aristoxenus stressed the important role that empirical observation should play in any music theoretical investigation, examining differences between discrete categories of “relaxation, tension, depth, height, and pitch” with frequent appeals to perception and the capabilities of the average human listener. The first question Aristoxenus asks, about the relationship between speech and melody, is the same one Patel asks in his own research.
Building on the ideas of the ancient Greek music theorists, classical and medieval writers argued that the building blocks of speech (letters) were combined into words in a process analogous to the way building blocks of music (notes) were combined into melodies (see Sullivan 2011). This analogy has been remarkably persistent: jumping forward to the twentieth century, more recent observations on the analogy between speech and melody have stressed the fundamental similarities between them by exploring the similar roles memory plays in generating both. In Discourse Networks 1800/1900, Friedrich Kittler compares the mnemonic procedures of cognitive psychologist and philosopher Herman Ebbinghaus to the serial compositions of Arnold Schoenberg. Ebbinghaus showed that meaningless syllables could be memorized without being understood, for “neither understanding nor the previously fundamental capacity of ‘inwardizing’ or recollection has any significant effect on the mechanics of memory” (Kittler 1990, 209). Ebbinghaus divided 2,299 possible linguistic syllables produced from random combinations of vowels and consonants into groups of seven to twenty-six which, “like Schoenberg’s twelve tones, are called series” (Kittler 1990, 210). He proceeded to combine these series in various permutations that Kittler recognized as analogous to Schoenberg’s tone rows: for both Schoenberg and Ebbinghaus, Kittler writes, “a combinatorics presented in the original material is subjected to a further combinatorics of the series and column” (Kittler 1990, 211). Language in the modernist age, including the flow of nonsense syllables in Ebbinghaus or the flow of tones in Schoenberg’s music, is a “serial, that is, temporally transposed, data flow” (Kittler 1993, 3) deriving from a matrix allowing both horizontal and vertical combinations.
These observations suggest that the connections between speech and music may be deeper than any mere sonic resemblance; deeper, perhaps, than what Patel proposes: that “the cadences of spoken language are (perhaps subconsciously) suffused into the fabric of music” (Patel 3), or that “instrumental music somehow reflects the general prosodic patterns of everyday speech” (Patel 4). Rather, the similarity between speech and melody may consist in the ways the brain combines, shuffles, and organizes the basic units of both – letters in the case of speech, notes in the case of music – in order to create complex sonic chains which, in turn, can suggest for the listener a variety of semantic meanings. Studies in music cognition can test these hypotheses by focusing on the way the brain operates during the act of assembling these basic units – that is, during the acts of speaking and performing – as well as on the way the listener can construct meaning from these two types of sonic material.
Focusing primarily on the cognitive processes that affect the listener’s perception of music, Iyer argues that musical perception is shaped by the bodily motions and gestures that create the sounds we perceive: “music is born of our actions” (Iyer 13), and “to listen to music is to perceive the actions of those bodies” (Iyer 4). Listening to a musical performance – which, by the very nature of its unfolding-in-time is always, despite its genre or style, to some extent an improvisatory act – involves “the perception of another body or bodies engaged in embodied, situated, real-time experience” (Iyer 7) and “a sense of mutual embodiment, of the shared space, time, and bodily presence of performers and observers, would seem to open the door to specific kinds of empathy” (Iyer 7). A key part of this concept of empathy consists in the “awareness of the performer’s coincident physical and mental exertion, of his or her ‘in-the-moment’ processes of creative activity and interactivity, the risks taken in the face of unbounded possibilities, the inherent constraints of the mind deciding and the body acting in time” (Iyer 7). All music is suffused with the “trace of the body” (Iyer 3), whether or not the performer’s bodies are actually present before the listener or not.
For Iyer, “the claim that music perception and cognition are embodied activities also means that they are actively constructed by the listener, rather than passively transferred from performer to listener” (Iyer 4). This suggests an alternative to conventional paradigms that construe the listener – and the viewer, the reader, and so forth – as passive agents. But, again, these ideas have long antecedents: as Mary Carruthers explains in The Book of Memory, medieval European scholars recognized that reading was not a passive process. Texts were understood to “recall through the windows of our eyes the voices of those who are not present to us” (Carruthers 211), engaging the minds of the writer and the reader into a sort of “hermeneutical dialogue” (Carruthers 231): “So long as the reader, in meditation (which is best performed in a murmur or low voice), reads attentively [my emphasis], that other member of the dialogue is in no danger of being lost, the other voice will sound through the written letters” (Carruthers 212). For such readers, imagining the writer’s voice would also invoke the embodied presence of the writer: in the words of the thirteenth-century trouvère Richard de Fourneval, “[W]hen I am not present, this writing… will make me present to your memory” (Carruthers 278). The medieval book itself could thus be understood as the transcription of a compositional act, an in-the-moment process that was perhaps not so different from Iyer’s description of musical improvisation: for as Carruthers explains, “composition is not an act of writing,” but a “rumination, cogitation, dictation, a listening and a dialogue” (Carruthers 244) which summons the traces of bodies.
For Iyer, understanding music cognition has a variety of political implications. He explains:
The idea of embodiment can also bring the field of music perception and cognition into a healthier dialogue with music humanities, which has in recent decades seen robust critical engagement with ‘the body’ in terms of race, gender, and sexuality. When we hear bodies but do not see them, we instead fantasize about them; listening to music (especially in the disembodied way that it circulates today) is deeply informed by that same process of racialized, sexualized fantasy-formation about the virtual bodies that made those sounds. (Iyer 4).
This intriguing observation offers a useful framework for understanding how research in music cognition and perception can inform our understanding of the various social and cultural contexts in which music is created, performed, heard, and exchanged today. But as I have tried to show in these historical examples, the way we listen to music today is remarkably consistent with the ways people listened to music, read texts, and looked at visual art in the past. By situating Iyer and Patel’s observations in a larger historical context, we discover that earlier voices may also have something useful to contribute to this discussion.
Leave a Reply