146th ASA Meeting, Austin, TX

To Rise or to Fall: That is the Question

Ann K. Syrdal- syrdal@research.att.com
Matthias Jilka*
AT&T Labs - Research
Florham Park, NJ 07932-0971
* currently: Institute of English Linguistics
University of Stuttgart
Stuttgart, Germany

Popular version of paper 3aSC29
Presented Wednesday morning, November 12, 2003
There are two common ways of asking questions in English. One way is with Yes/No questions. These can be answered by either yes or no (e.g., "Do you want to play a game?"). The other kind are Wh-questions, which ask for what/where/when/why/how information (e.g.,"What game do you want to play?"). Each of these two question types has distinctly different intonation. Intonation is the tune or melodic pattern of the human voice as its pitch rises and falls while speaking. Intonation doesn't describe what is said, but rather how it is spoken. It conveys information about the speaker's attitude or intent. Although the intonational features of questions are well known, they haven't been the focus of much empirical study. We became interested in studying question intonation as part of our efforts to improve the quality of our text-to-speech system, AT&T Natural Voices™.

American English intonation can be described using only two basic types of tones: High (H) and Low (L). When used to label prominent words, these tones are called pitch accents and are starred (e.g., H*, L*). A pitch accent in speech intonation is like underlining to emphasize a word in written text. High and Low tones are also used to describe phrase boundaries. Ends of sentences contain both a phrase accent (labeled H- or L-) and a boundary tone (labeled H% or L%). Characteristically, Yes/No questions end in high-pitched phrasal accents, H-H%, while Wh-questions end in low-pitched L-L% phrasal accents. Standard declarative sentences (e.g., "I like to play games.") also end with L-L%.

For our experiment, we collected and analyzed speech data from five female and three male American English professional voice talents recorded reading actual dialogues between customers and a travel agent. Then we tried to answer the following questions:

Q1. How often do Yes/No questions really end in the expected H-H% phrasal tones?

A. As shown in Table 1, over two-thirds of the Yes/No questions from the experiment ended in the expected high phrasal accent H-H% [example:F2:y3], although the results varied by gender and by sentence. Female speakers ended Yes/No questions with H-H% 80% of the time, but males did so only 50% of the time. Some Yes/No questions (e.g., y4) were less likely to end in H-H% than others (e.g., y2). The other ending phrasal accents most often used were L-L% (19%), which sounded like an imperative equivalent to "Please say your name again." [example:M1:y4]; and L-H% (9%), which conveyed some uncertainty [example:M2:y4]. The variety in y4 is because, despite the question's wording, the speaker really wasn't expecting a Yes/No answer, but was actually requesting that the name be repeated.

Table 1. Percentage of Yes/No Questions Ending in H-H% Phrasal Accents.

y1 Did you want to fly to London? 80% 33% 63%
y2 Are you interested in an earlier/later flight? 100% 67% 88%
y3 Are you leaving from Newark? 80% 67% 75%
y4 Could you say your name again? 60% 33% 50%
  OVERALL 80% 50% 69%

Q2: Are there any other characteristic intonational features of Yes/No questions?

A: Yes. The combination of nuclear and phrasal accents, which comprise sentence-final tune, is important in communicating meaning. A nuclear accent is the pitch accent that speakers place on the last intonationally prominent word in a sentence. Of the Yes/No questions that ended in the expected H-H% phrasal accent, 73% (100% for the males and 63% for the females) contained an L* (low-pitched) nuclear accent. The combination of a low L* nuclear accent followed by a high rising H-H% phrasal accent results in a very large sweep from low to high voice pitch at the end of the Yes/No question. The average rise in pitch from L* to H-H% was 164 Hz for female speakers and 104 Hz for males. This is about an octave rise made during the final word or two of the sentence! For the remaining 37% of H-H% Yes/No questions spoken by females, the nuclear accent was a low-to-high rising pitch accent followed by H-H%, which also yielded a large pitch rise (147 Hz on average).

Q3. How often do Wh-questions really end in the expected L-L% phrasal accents?

A. Table 2 shows that three of the four Wh-questions from the experiment typically end as expected with falling phrasal accents [example:M3 :w1 or F5:w3]. Sentence w4, in contrast, was never spoken with a final falling intonation pattern. Instead, w4 was spoken with intonation that conveyed some incredulity on the part of the speaker, as though (s)he didn't believe what (s)he'd just heard.[H-H% example: F2:w4], or was impatient with the conversation [H-L% example:F1:w4].

Table 2. Percentage of Wh-Questions Ending in L-L% Phrasal Accents.

w1 When did/do you need to be in London? 100% 100% 100%
w2 What's the month and date for your return flight? 100% 67% 88%
w3 What day did you want to leave Newark? 100% 100% 100%
w4 What was that again? 0% 0% 0%
  OVERALL 75% 67% 72%

Q4. Just how similar are the intonational contours and features of declarative sentences and Wh-questions with phrase-final falls (L-L%)?

A. Although there were no consistent differences between declaratives and Wh-questions in sentence-final tune, we found consistent intonation differences near the beginning of the two types of sentences. The first accented (intonationally emphasized) word in the Wh-questions (which was always one of the interrogative pronouns "when" or "what") was spoken more emphatically than the first accented word in the declarative sentences (which was the first word in the sentence for 75% of the declaratives). Prominence was indicated by pitch measures: the higher the pitch, or bigger the pitch rise, the more intonationally prominent the word. The effect was relatively large and consistent among all eight speakers. Three pitch measures were compared, as shown in Table 3: (1) the maximum voice pitch for H* (high) pitch accents; (2) the maximum pitch of L+H* (low to high rising) pitch accents); and (3) the extent of the pitch rise during L+H* pitch accents.

Table 3. Wh-Question minus Declarative Pitch Differences (Hz)

H* max 43 23 36  
L+H* max 36 37 36 F3:d1 vs. F3:w1
L+H* rise 49 20 41  

Q5. Does a speaker's pitch range play a role in question intonation?

A. Probably yes. We don't think this has been studied previously, but in our experiment, the lower the speaker's pitch range, the less likely the speaker was to use a final high rising H-H% phrasal accent for Yes/No questions. The correlation between these two measures was 0.69, which is surprisingly high for data from only eight speakers. The relationship was evident both within female and male speaker groups, so we suspect that it is a pitch-related phenomenon rather than a gender difference. To be certain, however, this needs to be verified with a larger group of speakers.

Our experiment has confirmed and expanded upon some expectations about question intonation, and has also revealed some unexpected findings. The results are both theoretically interesting and practically useful for improving the question intonation of synthetic speech. Listeners are quite sensitive in hearing when intonational features aren't quite right.

