Vai is a language spoken by 150,000 people in western Africa, specifically in Liberia and Sierra Leone. The language is noteworthy because its uses a remarkable system of sounds. Speakers must be able to pronounce seven oral vowels, five nasal vowels and 31 consonants all of which come in various combinations. In its written form, Vai has 229 characters.
So perhaps it wouldn’t be surprising if Vai had some interesting statistical characteristics not shared by other languages. If so, that might give some insight into the language’s unique history and evolution. This week, Charles Riley at Yale University and a few pals make exactly that claim.
Their analysis focuses on the the written form of Vai and the complexity of the characters in its alphabet. The complexity of a character is a measure of how difficult it is to draw. For example, the letter ‘O’ consists of two arches connected by two line sections which, using the strange arithmetic of character complexity, gives it a complexity of 8. The letter ‘X’ which is two straight lines that cross, has a complexity of 7.
By contrast, most characters in Vai have a complexity of more than 20 and one letter has a complexity of 48.
In all languages analysed to date, the complexity of characters is governed by an overarching rule which is that it is uniformly distributed. That means that there should be roughly equal numbers of characters with similar complexities. That’s true whether the language be Latin, Cyrillic and Runic scripts.
But Vai turns out to be different, says Riley and co. The complexity of the Vai alphabet is a better fit to a Poisson distribution rather than a uniform distribution.
So does that mean there is something special about Vai that sets it apart from other languages?
Maybe. The authors say non-uniform complexity is probably the result of the way the language was first written down in the mid-19th century. Riley and co suggest that this may have been influenced by a Cherokee native American who lived in an American mission in the area at the time.
Cherokee was famously first written down by a tribesman named Sequoyah who had seen western script without knowing what it mean. He then wrote out a similar looking script in which each sign represented a Cherokee syllable.
The clear, if improbably, implication by Riley and pals is that Vai was written down in the same way.
There are two problems with this analysis. First, as far as I know, Cherokee has not been subjected to this kind of analysis. If it has a uniform distribution, this idea is scuppered.
Second, what the authors fail to take into account is that although the alphabet has 229 characters, there is a large amount of redundancy and only 100 or so are in common usage.
When the analysis is redone using only these common characters, I wouldn’t mind betting that a uniform distribution of complexity emerges.
Which means that Riley and co have a little work to do before they take their analysis of Vai any further down this little backwater of linguistics
Ref: arxiv.org/abs/0810.0200: Distribution of Complexities in the Vai script