What is the rationale for considering di-, tri- &co. -phthongs separate entities? Why aren't these sounds interpreted as sequences of a vowel and a glide? How would be linguistics deficient if aliens came to Earth to erase the term from everyone's memory?
(I am interested in both, the historical reasons (Latin or Greek grammarians, I suppose?), and the modern rationale for continuing to use the term.)
It is useful to define a linguistic unit when there is evidence that some linguistic process is sensitive to that unit. For example, phonologists and phoneticians have long recognized that the concept of a syllable is a useful one due to evidence from scansion in poetry, syllable-final devoicing, processes related to stress and tone, phrase- and utterance-final syllable lengthening processes, etc.
Similarly, there are many processes that are easiest to describe when we are able to make reference to a diphthong as a unit. In "English Phonology", John Tillotson Jensen raises three sources of evidence for the diphthong as a unit in English:
- Backwards speech games - People who are asked to say the word "choice" backwards generally say [sojʧ] (and not [sjoʧ]), indicating that they are treating the vocalic portion of the word as a single unit.
- Syllabification - Glides can behave as syllable onsets, but words like Toyota are syllabified as toy.o.ta, suggesting that /oj/ is treated as a single unit (c.f. the Japanese syllabification to.yo.ta).
- Vowel shifts - Historically, many diphthongs are the result of vowel shifts that affected single vowels (e.g. the /aj/ diphthong in divine—c.f. divinity)
In addition, it is common to observe dialectal differences that revolve around diphthongs. For example, in many dialects of English spoken in the south of the U.S., the /aj/ diphthong is "monophthongized" and produced as [aː] (where [ː] is the symbol for "long"). It makes sense to think of this phenomenon as monophthongization as opposed to "glide dropping" because those same speakers still produce the /j/ glide as [j] elsewhere, including after /a/ (i.e. a speaker that says [aː.ɛs] for I.S. will still say [aː.jɛs] for ah yes).Tweet