Fourth ASA/ASJ Joint Meeting, Honolulu, Hawaii
Voice as Artistic Expression in Noh
Hideki Kawahara
(kawahara@sys.wakayama-u.ac.jp)
Popular version of paper 1pMU1.
A technique for remaking speech1, which was originally demonstrated by the Voder speech synthesizer at the New York World's Fair of 1939 and shortly afterwards by the Vocoder speech analyzer and resynthesizer, was developed at Bell Labs seven decades ago by a research physicist, Homer W. Dudley. Now this technology has been revived, modified, and improved in the form of STRAIGHT2,3 for high quality speech manipulation, using modern computational technology, in order to try to discover how emotion is communicated by voice quality control and to learn how to reproduce these characteristics computationally by manipulating acoustic parameters. The artistically expressive modes of subtle, stylized voice quality in Noh4 are a great challenge for analysis and synthesis. Noh, the quintessential Japanese theatre, with a continuous performance tradition of over seven centuries, provides us with a contemporary, living link to ancient theatrical traditions, handed down from father to son. The voice quality of the 'shite' (master player) conveys the emotional state of the character and transcends the limitations of language. This difficult vocal genre is a real test for the techniques of STRAIGHT, which uses advanced signal processing for analysis, synthesis, and morphing5 of the acoustic signal. The complex vocal techniques of Noh have pinpointed where the analysis/synthesis algorithms are currently inadequate to cope with human voice characteristics, particularly those associated with aperiodic laryngeal voice production, a key element in emotional speech.
A Noh performance is highly stylized, lyrical, beautiful, and profound, with the artistic goal of 'yugen', a refined elegance that encompasses a sense of universal mystery and the poignant pathos of human suffering. The play takes place on a simple stage, the 'butai', which resembles a Shinto shrine, where Noh was originally performed. Noh theatre proffers spare dramas of the soul, evocative, symbolic and elusive, comprising distilled modes of voice and movement, accompanied by chorus, flute, and drums. The 'shite' (master player) in Noh portrays climactic emotional scenes with subtle, stylized vocal forms, characterized by a special voice quality for suppressed, but strongly emotional, expression. The effect of the stage performance on the audience is deeply emotional due to their shared response, brought about by identification with the suffering of the main character. The 'shite' evokes this intense cathartic experience by his extraordinary vocal control, while the use of beautiful iconic wooden masks and costumes of exceptional textiles make the stage appearance uniquely artistic, combining to maintain Noh's traditional appreciation in Japan for well over half a millennium. Noh vocalization seems mystical and it has been difficult to objectively characterize its effects. However, that does not mean it cannot be investigated quantitatively. Recorded Noh voice, an objective representation, still maintains its strong emotional content, largely apart from its linguistic message. This study attempts to reveal the secrets of Noh vocalization, an exquisite means of human communication, by means of a revived speech remaking technology in the form of STRAIGHT. Voice Data CollectionThe sample dialogue that provides the experimental acoustic data is a poem, chanted by the Mother in Sumidagawa6 (Sumida River), performed by Yasuyuki Konparu, who is both the subject and a co-author of this research. He is also Master of the Komparu Noh Company, founded in the 15th century and he was officially designated an Important Intangible Cultural Asset of Japan in 2001. Variations of artistic expression in the traditional 'utai' (chanting) mode were used to portray the Mother's emotion of 1) reflection, 2) derangement, and 3) deep sadness. Additionally, each emotional state was portrayed in the contrasting, traditional stylized voice of 'wagin' (harmonic, weak chanting, more "feminine") and 'gougin' (dynamic, hard chanting, more "masculine"), for a total of six different expressive renditions of the same Noh text. Sumidagawa and the Mother's SpeechThe tragedy, Sumidagawa, belongs to the fourth thematic category of Noh plays, 'kyoujo mono' (deranged woman), and is a 'monogurui noh' play. The author is Zeami's son, Kanze Motomasa, who died as a young man in 1432. In this play, after a year of searching for her only child, taken by kidnappers, the grief-stricken Mother is driven to the brink of madness. The excerpt is from the dialogue between the mother and the Sumida ferryman, where she recites a poem by Fujiwara-no-Kanesuke (877-922 CE), collected in the Gosenshu7. This waka (Japanese poem) suggests that the anguished mother is an educated person:
Voice Analysis TechniquesSTRAIGHT inherits its essential framework from its Voder and Vocoder predecessors, invented in the 1930's. It extracts three physical parameters to represent speech sounds: fundamental frequency, spectral envelope, and the aperiodicity spectrum. Roughly speaking, fundamental frequency (F0) represents perceived pitch and related attributes such as vibrato, roughness, jitter, etc., the spectral envelope conveys linguistic information, and the aperiodicity spectrum correlates with voice quality. STRAIGHT uses sophisticated algorithms to extract these parameters, taking advantage of the enormous advances in computational power since the 1930's. These factors enable researchers to remake speech sounds that are sometimes indistinguishable from the original ones, using only data from these three types of parameters, in other words, from three sets of numbers. Morphing 5speech samples is an interesting strategy for investigating the physical correlates of perceptual attributes. It enables us to provide a stimulus continuum between two or more exemplar stimuli by interpolating the STRAIGHT parameters. For example, intermediate sets of numbers, derived from two separate speech samples, yield intermediate speech sounds by using this framework. This morphing of speech sounds, based on STRAIGHT, provides the means to manipulate delicate and subtle distinctions between different artistic expressions of emotion without knowing the precise correspondence between the physical parameters and the perceived impressions. It also enables researchers to catalogue the precise physical acoustic differences that contribute to perceptual differences. ResultsPreliminary analyses revealed that Noh vocalization uses complex modes of vocal fold vibration and that there is a contradictory physical correspondence with the expressive style. For example, 'wagin' (soft vocalization) physically displays more energy in power spectra, but perceptually gives a softer impression than 'gougin' (strong vocalization). Subtle vocal representations of emotion also require higher standards of analysis. A new technique for aperiodicity analysis8 was designed to meet these more stringent requirements, but even highly sophisticated fundamental frequency analysis procedures9,10 need further refinement. The results of new analyses and morphing demonstrations will be presented at the meeting. References:
1 Dudley, H. (1939) Remaking Speech.
Journal of the Acoustical Society of America,
11(2), pp.169-177.
|