A new paper in Science suggests that all human languages carry about the same amount of information per unit time. In languages with fewer possible syllables, people speak faster. In languages with more syllables, people speak slower.
Researchers quantified the information content per syllable in 17 different languages by calculating Shannon entropy. When you multiply the information per syllable by the number of syllables per second, you get around 39 bits per second across a wide variety of languages.
If a language has N possible syllables, and the probability of the ith syllable occurring in speech is pi, then the average information content of a syllable, as measured by Shannon entropy, is
For example, if a language had only eight possible syllables, all equally likely, then each syllable would carry 3 bits of information. And in general, if there were 2n syllables, all equally likely, then the information content per syllable would be n bits. Just like n zeros and ones, hence the term bits.
Of course not all syllables are equally likely to occur, and so it’s not enough to know the number of syllables; you also need to know their relative frequency. For a fixed number of syllables, the more evenly the frequencies are distributed, the more information is carried per syllable.
If ancient languages conveyed information at 39 bits per second, as a variety of modern languages do, one could calculate the entropy of the language’s syllables and divide 39 by the entropy to estimate how many syllables the speakers spoke per second.
According to this overview of the research,
Japanese, which has only 643 syllables, had an information density of about 5 bits per syllable, whereas English, with its 6949 syllables, had a density of just over 7 bits per syllable. Vietnamese, with its complex system of six tones (each of which can further differentiate a syllable), topped the charts at 8 bits per syllable.
One could do the same calculations for Latin, or ancient Greek, or Anglo Saxon that the researches did for Japanese, English, and Vietnamese.
If all 643 syllables of Japanese were equally likely, the language would convey -log2(1/637) = 9.3 bits of information per syllable. The overview says Japanese carries 5 bits per syllable, and so the efficiency of the language is 5/9.3 or about 54%.
If all 6949 syllables of English were equally likely, a syllable would carry 12.7 bits of information. Since English carries around 7 bits of information per syllable, the efficiency is 7/12.7 or about 55%.
Taking a wild guess by extrapolating from only two data points, maybe around 55% efficiency is common. If so, you could estimate the entropy per syllable of a language just from counting syllables.
To make such a broad-sweeping claim the data really needs to be universal, so I checked to see how broadly the languages they used are dispersed around the world. It’s a pretty decent variety, but it’s a bit concerning and disappointing that more than half of the languages are from the European continent (10 or 11 depending on how you count). While several of them are from different language families, it would be more convincing if they had used widely unrelated languages (say, from New Guinea, Native American languages, creoles, etc.) and come to the same conclusion. It’s an interesting study but a much wider variety of languages and families was possible and available for analysis.
You’ve got to start somewhere. I imagine the authors started with languages they had some familiarity with, or languages where they could build on existing research, where they could find a text corpus that made computing syllable frequencies feasible, etc.
The paper suggests a lot of projects for follow up research, including my suggestion of ancient languages and your suggestion of aboriginal languages.
I look at it from the perspective of my profession. Every so often I get asked to broadcast 39 bits per second to a room of people. Seems like quite the responsibility to keep my 39 bps at this efficiency rating as valuable as possible. I’ll put some thoughts out on this later.
Excellent
In practice, nearly all languages are overloaded to some extent by using the same sounds to refer to different things. This encodes more information per sound, but imposes computational burden on the listener to disambiguate using context. This is familiar at the word level (i.e. homonyms), but it also happens at larger groupings (e.g. “Pulitzer Prize” vs. “pullet surprise”).
Which sounds are homophones is dialect dependent. A famous example in English is ‘cot’ vs. ‘caught’, which are homonyms in many dialects and clearly distinct in many dialects. I was once stumped by a British word puzzle that depended on the reader’s dialect pronouncing ‘what’ and ‘wart’ identically…
A quick look at the article suggests that they (deliberately) ignored dialect variation within languages, and (more importantly?) that their results apply to people reading text aloud, as opposed to people speaking. Reading aloud is a very stylized activity to begin with; I’d be reluctant to assume that you can learn anything about actual speech that way.
(I’m also now curious about whether the ~39 bits per second holds in English for both native Manhattanites and Mississippians… Does the greater variety of vowels in Southern speech make up for the slower delivery?)