Fourth ASA/ASJ Joint Meeting, Honolulu, Hawaii

[ Lay Language Paper Index | Press Room ]


Voice as Artistic Expression in Noh

Hideki Kawahara (kawahara@sys.wakayama-u.ac.jp)
Department of Design Information Sciences, Wakayama University
930 Sakaedani, Wakayama, 640-8510 Japan

Osamu Fujimura
Professor emeritus, Ohio State University, Columbus OH

Yasuyuki Konparu
Head, Konparu School of Noh, Nara, Japan

Popular version of paper 1pMU1.
Presented Tuesday Afternoon, November 28, 2006
4th ASA/ASJ Joint Meeting, Honolulu, HI

 

A technique for remaking speech1, which was originally demonstrated by the Voder speech synthesizer at the New York World's Fair of 1939 and shortly afterwards by the Vocoder speech analyzer and resynthesizer, was developed at Bell Labs seven decades ago by a research physicist, Homer W. Dudley. Now this technology has been revived, modified, and improved in the form of STRAIGHT2,3 for high quality speech manipulation, using modern computational technology, in order to try to discover how emotion is communicated by voice quality control and to learn how to reproduce these characteristics computationally by manipulating acoustic parameters. The artistically expressive modes of subtle, stylized voice quality in Noh4 are a great challenge for analysis and synthesis. Noh, the quintessential Japanese theatre, with a continuous performance tradition of over seven centuries, provides us with a contemporary, living link to ancient theatrical traditions, handed down from father to son. The voice quality of the 'shite' (master player) conveys the emotional state of the character and transcends the limitations of language. This difficult vocal genre is a real test for the techniques of STRAIGHT, which uses advanced signal processing for analysis, synthesis, and morphing5 of the acoustic signal. The complex vocal techniques of Noh have pinpointed where the analysis/synthesis algorithms are currently inadequate to cope with human voice characteristics, particularly those associated with aperiodic laryngeal voice production, a key element in emotional speech.

Sumidagawa Played by Yasuyuki Konparu
The master player, "Shite" (Yasuyuki Konparu), is playing "Sumidagawa."
(photographed by Yoshiharu Ikegami) (c) 2006 Yasuyuki Konparu

A Noh performance is highly stylized, lyrical, beautiful, and profound, with the artistic goal of 'yugen', a refined elegance that encompasses a sense of universal mystery and the poignant pathos of human suffering. The play takes place on a simple stage, the 'butai', which resembles a Shinto shrine, where Noh was originally performed. Noh theatre proffers spare dramas of the soul, evocative, symbolic and elusive, comprising distilled modes of voice and movement, accompanied by chorus, flute, and drums. The 'shite' (master player) in Noh portrays climactic emotional scenes with subtle, stylized vocal forms, characterized by a special voice quality for suppressed, but strongly emotional, expression. The effect of the stage performance on the audience is deeply emotional due to their shared response, brought about by identification with the suffering of the main character. The 'shite' evokes this intense cathartic experience by his extraordinary vocal control, while the use of beautiful iconic wooden masks and costumes of exceptional textiles make the stage appearance uniquely artistic, combining to maintain Noh's traditional appreciation in Japan for well over half a millennium.

Noh vocalization seems mystical and it has been difficult to objectively characterize its effects. However, that does not mean it cannot be investigated quantitatively. Recorded Noh voice, an objective representation, still maintains its strong emotional content, largely apart from its linguistic message. This study attempts to reveal the secrets of Noh vocalization, an exquisite means of human communication, by means of a revived speech remaking technology in the form of STRAIGHT.

Voice Data Collection

The sample dialogue that provides the experimental acoustic data is a poem, chanted by the Mother in Sumidagawa6 (Sumida River), performed by Yasuyuki Konparu, who is both the subject and a co-author of this research. He is also Master of the Komparu Noh Company, founded in the 15th century and he was officially designated an Important Intangible Cultural Asset of Japan in 2001. Variations of artistic expression in the traditional 'utai' (chanting) mode were used to portray the Mother's emotion of 1) reflection, 2) derangement, and 3) deep sadness. Additionally, each emotional state was portrayed in the contrasting, traditional stylized voice of 'wagin' (harmonic, weak chanting, more "feminine") and 'gougin' (dynamic, hard chanting, more "masculine"), for a total of six different expressive renditions of the same Noh text.

Sumidagawa and the Mother's Speech

The tragedy, Sumidagawa, belongs to the fourth thematic category of Noh plays, 'kyoujo mono' (deranged woman), and is a 'monogurui noh' play. The author is Zeami's son, Kanze Motomasa, who died as a young man in 1432. In this play, after a year of searching for her only child, taken by kidnappers, the grief-stricken Mother is driven to the brink of madness. The excerpt is from the dialogue between the mother and the Sumida ferryman, where she recites a poem by Fujiwara-no-Kanesuke (877-922 CE), collected in the Gosenshu7. This waka (Japanese poem) suggests that the anguished mother is an educated person:

Although a mother's mind
May be unclouded,
She well may lose her way
Through love of her child.
(Translation by JSPS6)

excerpted Japanese text from Sumidagawa
Excerpted text represented in Japanese characters

Voice Analysis Techniques

STRAIGHT inherits its essential framework from its Voder and Vocoder predecessors, invented in the 1930's. It extracts three physical parameters to represent speech sounds: fundamental frequency, spectral envelope, and the aperiodicity spectrum. Roughly speaking, fundamental frequency (F0) represents perceived pitch and related attributes such as vibrato, roughness, jitter, etc., the spectral envelope conveys linguistic information, and the aperiodicity spectrum correlates with voice quality. STRAIGHT uses sophisticated algorithms to extract these parameters, taking advantage of the enormous advances in computational power since the 1930's. These factors enable researchers to remake speech sounds that are sometimes indistinguishable from the original ones, using only data from these three types of parameters, in other words, from three sets of numbers.

Morphing 5speech samples is an interesting strategy for investigating the physical correlates of perceptual attributes. It enables us to provide a stimulus continuum between two or more exemplar stimuli by interpolating the STRAIGHT parameters. For example, intermediate sets of numbers, derived from two separate speech samples, yield intermediate speech sounds by using this framework. This morphing of speech sounds, based on STRAIGHT, provides the means to manipulate delicate and subtle distinctions between different artistic expressions of emotion without knowing the precise correspondence between the physical parameters and the perceived impressions. It also enables researchers to catalogue the precise physical acoustic differences that contribute to perceptual differences.

Results

Preliminary analyses revealed that Noh vocalization uses complex modes of vocal fold vibration and that there is a contradictory physical correspondence with the expressive style. For example, 'wagin' (soft vocalization) physically displays more energy in power spectra, but perceptually gives a softer impression than 'gougin' (strong vocalization). Subtle vocal representations of emotion also require higher standards of analysis. A new technique for aperiodicity analysis8 was designed to meet these more stringent requirements, but even highly sophisticated fundamental frequency analysis procedures9,10 need further refinement. The results of new analyses and morphing demonstrations will be presented at the meeting.

References:

1 Dudley, H. (1939) Remaking Speech. Journal of the Acoustical Society of America, 11(2), pp.169-177.
2 Kawahara, H. (2006) STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoustical Science and Technology, 27(6), pp.349-353. [abstract], [pdf]
[Description and demonstration sounds (Flash movie) of emotional speech morphing]
3 Kawahara, H., Masuda-Katsuse, I. and de Cheveigné , A. (1999) Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction. Speech Communication, 27(3-4), pp.187-207.
4 Japan Arts Council, Government of Japan. An introduction to Noh and Kyogen
5 Kawahara, H. and Matsui, H. (2003) Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation, Proc. ICASSP 2003, I, pp.256-259.
6 Special Noh Committee, Japanese Classics Translation Committee, Nippon Gakujutsu Shinkoukai. (1960) Noh Drama. Ten Plays from the Japanese. UNESCO Collection of Representative Works: Japanese Series. Tokyo and Rutland VT: Charles E. Tuttle Co.
7Emperor Murakami, compiler. (951 CE) Gosenshu ['Later Collection', anthology of 1,426 Japanese waka].
8 Kawahra, H., Morise, M., Takahashi, T., Banno, H., and Fujimura, O. (Nov.2006) Source signal extraction and aperiodicity evaluation based on STRAIGHT spectrum. Technical Report of Speech Committee, IEICE, SP2006-83, 106(333), pp.43-48. [in Japanese]
9 de Cheveigné , A. and Kawahara, H. (2002) YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111(4), pp.1917-1930.
10 Kawahara, H., de Cheveigné, A., Banno, H., Takahashi, T. and Irino, T. (2005) Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT. Proc. Interspeech2005, Lisboa, pp.537-540.


[ Lay Language Paper Index | Press Room ]