There are, as is well known, two distinct texts in the Voynich ms, dubbed A and B, sometimes Currier A and Currier B.
I prefer to refer to them just as Text A and Text B.
Confusingly, Currier called them 'languages', even though he made it clear they were not distinct languages.
Rather, they are two versions of the same text, sharing the same glyph-system and much of the same vobaculary.
They might be better called 'dialects' if we are using linguistic categories. A more neutral term would simply be 'versions'. There is one system, but two recognizeable versions or variants of it.
Pages, and whole sections, are written in one or the other.
Recently, Rene Zandbergen has posted new statistical studies of this aspect of the text, proposing some finer distinctions. The purpose of the present post is to clarify the matter of textual versions in the light of those and previous investigations and in view of my own approach to the text.
My approach is conceptual (or phenomenological) rather than rigorously statistical, but of course statistics describe the phenomena (in one manner). We have quite a lot of data. What does it amount to?
* * *
The most outstanding and tell-tale, and simplest, distinction between Text A and Text B is in the frequency of the bigram [ed]. This has long been observed.
It is frequent or very frequent in all pages of Text B but almost entirely absent from pages of Text A.
It is a hard-and-fast and invariable distinction. If a page features the bigram [ed] it is Text B. If not, Text A.
There are other distinctions, but this is the crux of it. It signals a meaningful and consistent division between two versions of the text.
Pages can be labelled Text A or Text B accordingly.
* * *
It has long been understood, though, that the distinction is broad and that some pages are more obviously Text B than others.
Again, it is a matter of the bigram [ed].
There are pages where [ed] is very frequent, and other pages where it is common, but not as frequent - and pages where it is non-existent.
Again: it is absent from Text A pages. Among Text B pages, it is either common in some cases or very common, prolific, in others.
Roughly speaking, zero percent of Text A is [ed], on some pages up to 20%, but on others only just over 5%.
The hard distinction between A and B prevails, but within the B Text (text with [ed]) there is a further distinction.
These distinctions follow sections and the content of illustrations.
Zandbergen has mapped these carefully and divides the text up as follows:
Text A - [ed] is virtually absent.
We find this in the first section of botanical (herbal) pages, the Pharmacological text, the Pharmacological labels, and also - anomalously - on pages f58r and v.
Text B1 - [ed] is common but not prolific.
We find this type of text in the cosmological section, the Zodiac text and labels, and in Labels in general.
Text B2 - [ed] is very frequent. (In some cases up to 20% of the text.)
We find this concentration of [ed] in the second botanical (herbal) section, the Nymph section and the Star Catalogue in Quire 20 (known, ridiculously, as the "recipe" section.)
* * *
On these distributions, Zandbergen proposes introducing 'Language C', but that designation obscures the hard distinction between A and B.
We might say there is only a soft - albeit clear and consistent - distinction between B and what Zandbergen is calling C.
For this reason I think it is more in order to refer to Text B1 and Text B2 and it is regrettable that talk of various "languages" continues.
Now we have three 'languages' (which everyone admits are not distinct 'languages' but, if anything, three versions or formulations of the same 'language'.) The terminology was confusing from the outset, and is now compounded.
In any case, the distinctions are clear from the statistics, and even apparent to the naked eye. Cast your eye over pages of text: in some pages there is no [ed], in some pages it is frequent, but in other pages it is very frequent.
We should not lose sight of the fact, though, that there is a single system, with the same glyph system, with the same basic rules, throughout the entire work - 'Voynichese' - but there are two versions of it, A and B, and then there are two variants of B.
There is no basis whatsoever for supposing that A, B, or C, are all different plaintext languages (which is what the adopted terminology suggests.) It is not like A is Spanish, B is German and C is French.
A better analogy might be something like the distinctions between Metropolitan (Standard) French, Levantine French (Lebanonese) and Mahgreb French (Moroccan).
Or in English we might distinguish between the King's English and two closely related dialects of English, say, Manucian spoken in Manchester and Scouse spoken in Merseyside.
Let us imagine the King's English is clean and free of some inelegant linguistic habit that is common in Manucian, while Scouse is utterly thick with it.
But this is to apply a linguistic metaphor to the phenomenon. Voynichese may not be a 'language' at all. It is only for convenience that we speak of 'languages' and 'dialects' and 'words' and 'letters' and 'prefixes' and 'suffixes' and all the other linguistic terms we use.
A more neutral way of talking about it is to speak of Systems and Sub-Systems.
in my own studies, I have (perforce) abandoned linguistic models and now regard Voynichese as some sort of (astrological) notation that is primarily numerical and only quasi-linguistic, linguistic in appearance.
Nor is there any basis for lose talk about the text "evolving". The variants seem settled rather than fluid, although we must also admit that the text is not homogenous and there are anomolous pages. (Several pages completely lack the glyph [q], for example.)
The [ed] test is consistent, though. It is a reliable guide. Text A lacks this bigram. It is common in all parts of Text B and prolific in some parts of Text B.
* * *
Another, coinciding, and revealing measure of these textual variants is to look at the word [daiin].
This is the most common word in the entire text, but it is especially prevalent in Text A, less so in Text B, with a distinction that can be made between B1 and B2.
Zandbergen summarises it as follows:
"One way of looking at the A and B languages may be as a mixture of 'daiin-language' and something else. Language A is full 'daiin-language', Herbal-B is only half, and the other B dialects one third."
When we look back at Zandbergen's statistics for the frequency of the tell-tale bigram [ed], we find that they suggest these rough proportions.
If the A Text is the default, some of the B Text deviates from it by a third and some of it by a half.
* * *
As an extension of this we find another hard distinction: the most common word in Text B is [chedy], but it does not appear at all in Text A.
An invariable rule: if the text includes the word [chedy], it is Text B and cannot be Text A.
So [daiin] is the most common word in the manuscript over-all and is prolific throughout, but especially so in the A Text. (The A Text is "pure daiin-language".)
But [chedy] is the most frequent word in the B Text, yet only appears in the B Text. We could properly call the B Text the Chedy Text.
To requote Zandbergen:
"One way of looking at the A and B languages may be as a mixture of 'daiin-language' and something else."
The "something else" is marked by the appearance of the word [chedy].
This reveals an important rule:
Words occur in both Text A and B, or they only occur in Text B.
There are very few words that appear frequently in Text A that are not also found in Text B, but the opposite is not true.
Once again, we see that the correct way to characterize Voynichese is as a single 'language', a single system, a single phenomenon, (the daiin-language of Text A), and the B Text has another system (the Chedy System) superimposed upon and intermixed with it.
Looking at some other indicators observed over the years:
There is a high frequency of the suffix [-dy] in the B Text, and usually as [-edy]. There is far less of this in Text A, and the suffix [-dy] tends to be as [-ody] rather than [-edy].
There is a tendency towards [o] rather than [e] in the A Text, so we find [eo] rather than [ee] which is more typical of the B Text.
Words in Text A tend to be shorter than those in Text B. The most common words in Text A tend to be short three-glyph or two-glyph words with an [o] or an [a]. Eg. [chol].
* * *
Zandbergen explores statistical differences between pages and sections and on this basis makes his finer distinctions. His studies reveal some interesting and curious phenomena, all of which require explanations.
Again: the text is not homogenous. It is all 'daiin-language' (Voynichese), sure enough - no text is a radical departure from the rest - and there are two styles, Text A and B, but there are distinct differences here and there.
The example already mentioned is the glyph [q]. Inexplicably, it is completely missing from the first page of the text.
Moreover, the common prefix [qo-] is almost entirely lacking in the Zodiac pages and is absent from the map (the foldout map known, ridiculously, as the 'Rosettes Page'.)
Similarly, the common configuration [cho] is also missing from the map and from the Nymph section (known, ridiculously, as the 'Biological' section) but appears in other B Text.
The combination [eo] is noticeably more common in the Pharmacology text, making it different to the rest of the A Text.
On the basis of such divides, Zandbergen has mapped various 'dialects' of 'Languages' A, B and C, as he calls them, in order to provide a more nuanced and complete view of textual variants.
It is a purely quantitative study. In many cases he is observing fine statistical distinctions. If more than 1% of words in a text end in [d], or if the words [ol] and [or] constitute more than 1% of a text, then it indicates 'Language C'. And so on.
He establishes three main criteria, however. The frequency of [ed], the frequency of [eo] and the frequency of [q].
It is important work and advances studies usefully, revealing important patterns in the text - perhaps the proper word is textures?
* * *
Let us revisit the groupings made earlier, and now observe what Zandbergen calls the 'dialects':
Text A - [ed] is virtually absent.
We find this in the first section of botanical (herbal) pages, the Pharmacological text, the Pharmacological labels, and also - anomously - on pages f58r and v.
We can, however, identify a variant in the Pharmacology text (but not Pharmacology labels.) Zandbergen calls this dialect Ae.
The first section of botanical (herbal) pages, the Pharmacological labels, and also - anomously - on pages f58r and v. are in standard A Text, but the Pharmacology text features a remarkable amount of the bigram [eo].
Only about 5% of the text in the first botanical pages is [eo], but in the Pharmacology pages it is about 25%.
This is the biggest concentration of [eo] in the whole text. For whatever reason, the configuration [eo] is concentrated in the Pharmacology text. Notably so.
Text B1 - [ed] is common but not prolific.
We find this type of text in the cosmological section, the Zodiac text and labels, and in Labels in general.
Zandbergen observes that the prefix [qo-] is absent from the Zodiac text and labels and so regards them as a distinct 'dialect' of Text B. He calls this dialect Ce. It could be called the Zodiac dialect.
Text B2 - [ed] is very frequent. (In some cases up to 20% of the text.)
We find this concentration of [ed] in the second botanical (herbal) section, the Nymph section and the Star Catalogue in Quire 20 (known, ridiculously, as the "recipe" section.)
Zandbergen's distinctions here are more verigated. He sees three 'dialects' he calls B, Bs and Bb.
The second botanical section is in the plain B dialect, the Stars Catalogue (Quire 20) is in Bs (s for stars), while the Nymph section is in Bb.
The difference is in the frequency of [q]. In the Nymph section it is nearly 25% of glyphs. In the Stars Catalogue it is just over 15%, while in the Botanical B pages it is 10%.
Moreover, the form [qol] is unusually common in the Nymph pages. Just as [eo] marks the Pharmacology section, [qol] marks the Nymph section.
We can summarize these distinctions as follows:
Text A
*Pharmacology Variant ( Text A but with lots of [eo])
Text B1
*Zodiac Variant (Text B but with no [qo])
Text B2
*Star Variant (Text B but with more [q].)
*Nymph Variant (Text B but with more [q] and lots of [qol])
We could portray this in many ways. One way, which emphasises the ubiquitous nature of Text A vocabulary, and retains the binary distinction A and B, is as follows:
* * *
As a general procedure, pages of text can be assessed in terms of the frequency of [q].
Zandbergen uses this as a sub-classification, establishing some statistical boundaries.
In short: there is text with very little [q], text with lots of [q] and text with a moderate amount of [q].
He applies the code: 0, +, - but it might be clearer to mark it q0, q+ and q- , or q(0), q(+), q(-) or such. It is a measure of [q].
However we indicate it, it is a notable feature of the text as a whole. The distribution of the glyph [q] is not homogenous and especially indicative of certain structures (or textures) of the text.
* * *
We should remember at this point that the glyph [q] is the numeral 4.
The entire text displays glyphs that are familiar letters from Latinate alphabets, and numerals. The whole text is alphanumeric, including both letters and numbers, in this sense - (a property obscured in EVA.)
The glyphs [q], [d] and [y] are conspicuously numbers.
It must follow from the above, therefore, that some parts of the text - and some versions of the text - will contain more of these numerical glyphs that other parts of the text.
Zandbergen has not considered this, but we can tell - without compiling the stats - that the B Text will be more 'numerical' in terms of these glyphs.
* * *
A peculiar difference between the textual variants, not mentioned in Zandbergen's recent study, is that the glyph [m] appears in both Text A and Text B, but in Text B it is always line-final.
In Text A it is usually word-final, and often line-final. But it occurs in other contexts. But in Text B it is always line-final.
For some reason I suspect this might be especially important, but as yet I have no idea what it might be telling us.
* * *
I have outlined my approach to these matters over many posts.
The text, I propose, is based on two paradigms, or templates, or verbum potentaie, namely QOKEEDY and CHOLDAIIN.
(Importantly, I conceive of these as cycles.)
This model, I argue, has enormous explanatory power and enables us to explore the intimate workings of the text.
The textures of the text are especially amenable to a template approach.
Assuredly, it is more complex than this, because the two paradigmatic words operate at the high level of the text and are themselves products of underlying cycles, and the CHOLDAIIN template allows variant glyphs such as [r] instead of [n] - but the text is a combination, an overlapping, intermixing, of the two templates.
The two 'languages', Text A and B, I have previously explained as two different
combinations of the two paradigmatic words.
In Text A the CHOLDAIIN paradigm dominates. In Text B the QOKEEDY paradigm dominates. We see these paradigms merging in different concentrations in different parts of the text.
I am inclined to refer to Text A, 'Language A', 'Currier A' as 'Choldaiin' and Text B, 'Language B', 'Currier B', as 'Qokeedy'.
As a general proposition, this remains a good explanation and on the whole it conforms to the statistical evidence.
To illustrate its explanatory power:
The bigram [ed] does not occur in Text A. This is the main mark of the two texts.
Explanation: Text A is dominated by CHOLDAIIN with little QOKEEDY. The bigram [ed] belongs to the QOKEEDY paradigm and so does not intrude into Text A. It is typical of all forms of Text B.
The word [chedy] does not occur in Text A but is the most common word in Text B.
Explanation: Again, Text A is dominated by CHOLDAIIN with little QOKEEDY. The form [-edy] belongs to the QOKEEDY paradigm. It is typical of all forms of Text B.
The [ch-] in [chedy], in my account of the paradigms, is a play on the [ee] in QOKEEDY. The bigram [ch] is [ee] with a ligature.
[che], I suggest, is actually a compaction of [eee] - which we find in the B Text - and so is alien to the A Text even though CHOLDAIIN features the glyph [ch].
Thus [chedy] features in the text that is based on QOKEEDY but is not found in the text based on CHOLDAIIN.
Text A features common short words with [o] or [a].
Explanation: These are all constructed from the constituent elements of CHOLDAIIN and are usually compactions or variants of [chol].
I have argued that the CHOLDAIIN paradigm is inherently given to bifurcation and dissolution into smaller units. The QOKEEDY paradigm is triune and adheres.
In Text A we see this tendency of the CHOL+DAIIN paradigm to disintegrate into smaller units.
There is more [eo] in the Pharmacological text than in the A Text generally.
Explanation: The QOKEEDY paradigm intrudes in the Pharma. text to a greater extent than in the plain A Text. (For reasons unknown.)
There is a concentration of [qol] in the Nymph section.
Explanation: the Nymph Text is strongly based on QOKEEDY but the form [chol] from CHOLDAIIN overlaps with QOK. The trigrams [chol] and [qok] are superimposed. (For reasons unknown.)
In general, we can explain the textual variants by way of different combinations of the two paradigms. In the B Text, especially, 'dialects' are nearer to or further from the template QOKEEDY.
I am confident that all the textures of the text can be explained, or be better understood, in these terms.
Why different mixtures of the paradigms are employed in different sections of the text is another question. What does the combination [eo] have to do with Pharmacology, for example?
Nevertheless, that is how it appears. There are two templates, CHOLDAIIN and QOKEEDY, and they are mixed together in different concentrations and in different ways in different parts of the text. (For reasons unknown.)
* * *
Finally, let us rehearse some things that are NOT subject to the 'dialect' phenomenon.
For a start, both texts are found in the same manuscript and nowhere else. The entire text is unfamiliar and unique to the manuscript.
There is no sign in marginalia or elsewhere that any owner of the manuscript was able to penetrate any part of the text. It is uniformly cryptic.
The same glyph set is used throughout. There are no glyphs or variants unique to one text or the other or one page or section than another, only different concentrations and combinations of the same glyph-set.
The gallows glyphs - surely the hallmark of the script - are found throughout. The benched gallows are more subject to uneven distribution by 'dialect', but are also found throughout.
All of the glyphs used are known in the Latin manuscript tradition: some are rare but none are exotic.
All text displays aspects of the Curve-and-Line phenomenon described by Brian Cham. It is inherent to Voynichese as a whole.
The same formats are used throughout. Lines, paragraphs, labels.
The entire text is strongly positional in nature. In both texts certain glyphs tend to favor particular positions, either in words, lines or paragraphs.
There is a large shared vocabulary. There are common words - notably [daiin] - that run throughout all sections of the text.
The entire text is characterized by repetition and repetitive similarity. We might characterize this as: combinatorial. In all versions of the text we encounter families of similar vocabulary: daiin, kaiin, okaiin, dain, odain, dan, taiin, chaiin, shaiin, keedy, qokeedy, qokedy, qokchedy, okchdy... and so on as if possible combinations of certain glyphs are being exhausted.
We could go on, all of it pointing to the fact that Voynichese is a single phenomenon, albeit one with internal variants. Nothing says we have two (or more) languages or an exotic intrusion. And nothing 'evolves'. We have a single system that is applied differently in different parts of the text.
R.B.
No comments:
Post a Comment