• Ingen resultater fundet

Acoustic investigations into stød

Background

2.2 Acoustic investigations into stød

To analyse a complex acoustic signal, the signal must be decomposed. Several decomposition methods exist. In ASR, Discrete Fourier Transformation is used, but other disciplines such as speech synthesis may use the Continuous Wavelet Transform. In essence, the two methods decompose a complex signal into component signals at different frequencies.

A different account of stød stems from Hansen (2015). His description is based on Ladefoged’s phonation types.

The stødbasis and phonation-based accounts of stød are outlined below.

2.2.1.1 Ballistic model

A syllable has the potential for stød if it has stødbasis. There can only be one stød per syllable and polysyllabic words can have more than one stød. Grønnum (2005) describes the realisation of stød in two acoustic events:

ˆglottal stop

ˆcreaky voice

Glottal stop is an instant of glottal closure where the vocal folds are closed and prevent airflow through the vocal tract. In colloquial English, a glottal stop can replace a [t] in words likemountainormetal. In Danish, it can also signify the realisation of stød in extreme cases according to Grønnum & Basbøll (2007).

Creaky voice describes a type of phonation. The vocal folds of a human can be open and allow for maximum airflow or be closed and prevent airflow. In either state, the vocal folds do not vibrate. In between maximum and zero airflow are degrees of openness that determine the vibration of the vocal folds when uttering sonorants. When the vocal folds constrict airflow and are relaxed, vocal fold vibration is not completely harmonic. The slight disharmony on an otherwise harmonic acoustic signal sounds like a ‘creak’

and gives rise to the namecreaky voice.

Grønnum & Basbøll (2007) describe stød phonetically as a ballistic gesture which minimally generates a slightly compressed voice or maximally a distinct creaky voice and a glottal stop under emphasis, as aligned with syllable onset and a property of the syllable rhyme. The ballistic gesture is a muscular response to a neural command that, once executed, the speaker can no longer control.

2.2.1.2 Phonation-based model

A way to describe voice quality is to use phonation types. There is a continuum of the degree of openness of the vocal folds that span from closed as is the case with the glottal stop, and most open, where the vocal folds do not vibrate and airflow passes unhindered. The degrees of openness of the vocal folds are binned into different phonation types:

breathy modal creaky

Most open Most closed

Figure 2.3: Voice quality scale after Gordon & Ladefoged (2001).

Breathy voice is a type of phonation where the vocal folds are far apart, do not constrict airflow and vibrate very little.

Modal voice is often described as the optimal degree of openness and vibration for sonorants.

This model describes stød as a correlation with voice quality at the scale between modal and creaky voice. Realisation of stød as a glottal stop would be one extreme and modal voice would be the absence of stød and the other extreme. Hansen formulates his hypothesis as:

“The hypothesis is that stød is expressed as a relatively short change in voice quality towards a more compressed e.g. creaky voice quality and subsequently returns to less pressed voice quality. Hence, stød is treated as a dynamic voice quality gesture. A well-formed stød is a suitably large fluctuation in voice quality over a suitably (short) time frame. Whether creak occurs in connection with stød or not depends on where on the voice quality scale the stød fluctuation starts.”6

Creakis a term for a phonation type between creaky and modal voice, but also denotes a suprasegmental feature. Unfortunately, Hansen must reject his theory after rigorous evaluation and the phonetic description of stød remains unclear.

2.2.1.3 Ballistic vs. phonation model

The ballistic model describes how stød is produced and how stød manifestation can vary according to the strength of a neural command. The phonation-based model is a dynamic voice quality gesture that accounts for the manifestation of stød and explains why stød manifestations which are acoustically dissimilar are perceived similarly. So the ballistic model accounts for articulation and production and the phonation-based model also accounts for perception, but the two models are only mutually exclusive in the production account, i.e. a ballistic gesture vs. a voice quality gesture.

6The author’s translation of the hypothesis in Hansen (2015) from Danish to English.

The two explanations or descriptions are relevant to this study because we will be using data sets annotated with stød that has been created manually and mainly annotated by students of Nina Grønnum.

We assume they have applied, or at least been influenced by, her theories. This may be beneficial if annotators use the same method to annotate, but the annotation conventions that a theory or method applies may be a source of error. As described above, stød manifestation coincides with the segment boundary if stødbasis is a short vowel followed by a sonorant consonant. The convention is to annotate stød on the sonorant consonant, but if the stød manifestation is not prototypical and stød is manifested on the vowel, we do not know if the annotator follows convention or his/her aural perception.

While Hansen uses a very small data set from a single speaker, his work is the most thorough acoustic/-phonetic research available. Hansen seeks a characterisation of stød and though he rejects his hypothesis, his observations has guided the methodology chosen in Chapter 4.