Monday, June 21, 2010

The Mystery of the Voynich Manuscript

Scientific American Magazine -  June 21, 2004








By Gordon Rugg 

In 1912 Wilfrid Voynich, an American rare-book dealer, made the find of a lifetime in the library of a Jesuit college near Rome: a manuscript some 230 pages long, written in an unusual script and richly illustrated with bizarre images of plants, heavenly spheres and bathing women. Voynich immediately recognized the importance of his new acquisition. Although it superficially resembled the handbook of a medieval alchemist or herbalist, the manuscript appeared to be written entirely in code. Features in the illustrations, such as hairstyles, suggested that the book was produced sometime between 1470 and 1500, and a 17th-century letter accompanying the manuscript stated that it had been purchased by Rudolph II, the Holy Roman Emperor, in 1586. During the 1600s, at least two scholars apparently tried to decipher the manuscript, and then it disappeared for nearly 250 years until Voynich unearthed it. Voynich asked the leading cryptographers of his day to decode the odd script, which did not match that of any known language. But despite 90 years of effort by some of the world's best code breakers, no one has been able to decipher Voynichese, as the script has become known. The nature and origin of the manuscript remain a mystery. The failure of the code-breaking attempts has raised the suspicion that there may not be any cipher to crack. Voynichese may contain no message at all, and the manuscript may simply be an elaborate hoax.
Critics of this hypothesis have argued that Voynichese is too complex to be nonsense. How could a medieval hoaxer produce 230 pages of script with so many subtle regularities in the structure and distribution of the words? But I have recently discovered that one can replicate many of the remarkable features of Voynichese using a simple coding tool that was available in the 16th century. The text generated by this technique looks much like Voynichese, but it is merely gibberish, with no hidden message. This finding does not prove that the Voynich manuscript is a hoax, but it does bolster the long-held theory that an English adventurer named Edward Kelley may have concocted the document to defraud Rudolph II. (The emperor reportedly paid a sum of 600 ducats--equivalent to about $50,000 today--for the manuscript.)
Perhaps more important, I believe that the methods used in this analysis of the Voynich mystery can be applied to difficult questions in other areas. Tackling this hoary puzzle requires expertise in several fields, including cryptography, linguistics and medieval history. As a researcher into expert reasoning--the study of the processes used to solve complex problems--I saw my work on the Voynich manuscript as an informal test of an approach that could be used to identify new ways of tackling long-standing scientific questions. The key step is determining the strengths and weaknesses of the expertise in the relevant fields.
Baby God's Eye?
The first purported decryption of the Voynich manuscript came in 1921. William R. Newbold, a professor of philosophy at the University of Pennsylvania, claimed that each character in the Voynich script contained tiny pen strokes that could be seen only under magnification and that these strokes formed an ancient Greek shorthand. Based on his reading of the code, Newbold declared that the Voynich manuscript had been written by 13th-century philosopher-scientist Roger Bacon and described discoveries such as the invention of the microscope. Within a decade, however, critics debunked Newbold's solution by showing that the alleged microscopic features of the letters were actually natural cracks in the ink.


The Voynich manuscript appeared to be either an unusual code, an unknown language or a sophisticated hoax.
Newbold's attempt was just the start of a string of failures. In the 1940s amateur code breakers Joseph M. Feely and Leonell C. Strong used substitution ciphers that assigned Roman letters to the characters in Voynichese, but the purported translations made little sense. At the end of World War II the U.S. military cryptographers who cracked the Japanese Imperial Navy's codes passed some spare time tackling ciphertexts--encrypted texts--from antiquity. The team deciphered every one except the Voynich manuscript.
In 1978 amateur philologist John Stojko claimed that the text was written in Ukrainian with the vowels removed, but his translation--which included sentences such as "Emptiness is that what Baby God's Eye is fighting for"--did not jibe with the manuscript's illustrations nor with Ukrainian history. In 1987 a physician named Leo Levitov asserted that the document had been produced by the Cathars, a heretical sect that flourished in medieval France, and was written in a pidgin composed of words from various languages. Levitov's translation, though, was at odds with the Cathars' well-documented theology.
Furthermore, all these schemes used mechanisms that allowed the same Voynichese word to be translated one way in one part of the manuscript and a different way in another part. For example, one step in Newbold's solution involved the deciphering of anagrams, which is notoriously imprecise: the anagram ADER, for instance, can be interpreted as READ, DARE or DEAR. Most scholars agree that all the attempted decodings of the Voynich manuscript are tainted by an unacceptable degree of ambiguity. Moreover, none of these methods could encode plaintext--that is, a readable message--into a ciphertext with the striking properties of Voynichese.
If the manuscript is not a code, could it be an unidentified language? Even though we cannot decipher the text, we know that it shows an extraordinary amount of regularity. For instance, the most common words often occur two or more times in a row. To represent the words, I will use the European Voynich Alphabet (EVA), a convention for transliterating the characters of Voynichese into Roman letters. An example from folio 78R of the manuscript reads: qokedy qokedy dal qokedy qokedy. This degree of repetition is not found in any known language. Conversely, Voynichese contains very few phrases where two or three different words regularly occur together. These characteristics make it unlikely that Voynichese is a human language--it is simply too different from all other languages.
The third possibility is that the manuscript was a hoax devised for monetary gain or that it is some mad alchemist's meaningless ramblings. The linguistic complexity of the manuscript seems to argue against this theory. In addition to the repetition of words, there are numerous regularities in the internal structure of the words. The common syllable qo, for instance, occurs only at the start of words. The syllable chek may appear at the start of a word, but if it occurs in the same word as qo, then qo always comes before chek. The common syllable dy usually appears at the end of a word and occasionally at the start but never in the middle.
A simple "pick and mix" hoax that combines the syllables at random could not produce a text with so many regularities. Voynichese is also much more complex than anything found in pathological speech caused by brain damage or psychological disorders. Even if a mad alchemist did construct a grammar for an invented language and then spent years writing a script that employed this grammar, the resulting text would not share the various statistical features of the Voynich manuscript. For example, the word lengths of Voynichese form a binomial distribution--that is, the most common words have five or six characters, and the occurrence of words with greater or fewer characters falls off steeply from that peak in a symmetric bell curve. This kind of distribution is extremely unusual in a human language. In almost all human languages, the distribution of word lengths is broader and asymmetric, with a higher occurrence of relatively long words. It is very unlikely that the binomial distribution of Voynichese could have been a deliberate part of a hoax, because this statistical concept was not invented until centuries after the manuscript was written.
Expert Reasoning
In summary, the Voynich manuscript appeared to be either an extremely unusual code, a strange unknown language or a sophisticated hoax, and there was no obvious way to resolve the impasse. It so happened that my colleague Joanne Hyde and I were looking for just such a puzzle a few years ago. We had been developing a method for critically reevaluating the expertise and reasoning used in the investigation of difficult research problems. As a preliminary test, I applied this method to the research on the Voynich manuscript. I started by determining the types of expertise that had previously been applied to the problem.
The assessment that the features of Voynichese are inconsistent with any human language was based on substantial relevant expertise from linguistics. This conclusion appeared sound, so I proceeded to the hoax hypothesis. Most people who have studied the Voynich manuscript agreed that Voynichese was too complex to be a hoax. I found, however, that this assessment was based on opinion rather than firm evidence. There is no body of expertise on how to mimic a long medieval ciphertext, because there are hardly any examples of such texts, let alone hoaxes of this genre.
Several researchers, such as Jorge Stolfi of the University of Campinas in Brazil, had wondered whether the Voynich manuscript was produced using random text-generation tables. These tables have cells that contain characters or syllables; the user selects a sequence of cells--perhaps by throwing dice--and combines them to form a word. This technique could generate some of the regularities within Voynichese words. Under Stolfi's method, the table's first column could contain prefix syllables, such as qo, that occur only at the start of words; the second column could contain midfixes (syllables appearing in the middle of words) such as chek, and the third column could contain suffix syllables such as y. Choosing a syllable from each column in sequence would produce words with the characteristic structure of Voynichese. Some of the cells might be empty, so that one could create words lacking a prefix, midfix or suffix.


English adventurer Edward Kelley may have concocted the document to defraud Rudolph II, the Holy Roman Emperor.
Other features of Voynichese, however, are not so easily reproduced. For instance, some characters are individually common but rarely occur next to each other. The characters transcribed as a, e and l are common, as is the combination al, but the combination el is very rare. This effect cannot be produced by randomly mixing characters from a table, so Stolfi and others rejected this approach. The key term here, though, is "randomly." To modern researchers, randomness is an invaluable concept. Yet it is a concept developed long after the manuscript was created. A medieval hoaxer probably would have used a different way of combining syllables that might not have been random in the strict statistical sense. I began to wonder whether some of the features of Voynichese might be side effects of a long-obsolete device.
The Cardan Grille
It looked as if the hoax hypothesis deserved further investigation. My next step was to attempt to produce a hoax document to see what side effects emerged. The first question was, Which techniques to use? The answer depended on the date when the manuscript was produced. Having worked in archaeology, a field in which dating artifacts is an important concern, I was wary of the general consensus among Voynich researchers that the manuscript was created before 1500. It was illustrated in the style of the late 1400s, but this attribute did not conclusively pin down the date of its origin; artistic works are often produced in the style of an earlier period, either innocently or to make the document look older. I therefore searched for a coding technique that was available during the widest possible range of origin dates--between 1470 and 1608.
A promising possibility was the Cardan grille, which was introduced by Italian mathematician Girolamo Cardano in 1550. It consists of a card with slots cut in it. When the grille is laid over an apparently innocuous text produced with another copy of the same card, the slots reveal the words of the hidden message. I realized that a Cardan grille with three slots could be used to select permutations of prefixes, midfixes and suffixes from a table to generate Voynichese-style words.
A typical page of the Voynich manuscript contains about 10 to 40 lines, each consisting of about eight to 12 words. Using the three-syllable model of Voynichese, a single table of 36 columns and 40 rows would contain enough syllables to produce an entire manuscript page with a single grille. The first column would list prefixes, the second midfixes and the third suffixes; the following columns would repeat that pattern. You can align the grille to the upper left corner of the table to create the first word of Voynichese and then move it three columns to the right to make the next word. Or you can move the grille to a column farther to the right or to a lower row. By successively positioning the grille over different parts of the table, you can create hundreds of Voynichese words. And the same table could then be used with a different grille to make the words of the next page.
I drew up three tables by hand, which took two or three hours per table. Each grille took two or three minutes to cut out. (I made about 10.) After that, I could generate text as fast as I could transcribe it. In all, I produced between 1,000 and 2,000 words this way.
I found that this method could easily reproduce most of the features of Voynichese. For example, you can ensure that some characters never occur together by carefully designing the tables and grilles. If successive grille slots are always on different rows, then the syllables in horizontally adjacent cells in the table will never occur together, even though they may be very common individually. The binomial distribution of word lengths can be generated by mixing short, medium-length and long syllables in the table. Another characteristic of Voynichese--that the first words in a line tend to be longer than later ones--can be reproduced simply by putting most of the longer syllables on the left side of the table.
The Cardan grille method therefore appears to be a mechanism by which the Voynich manuscript could have been created. My reconstructions suggest that one person could have produced the manuscript, including the illustrations, in just three or four months. But a crucial question remains: Does the manuscript contain only meaningless gibberish or a coded message?
I found two ways to employ the grilles and tables to encode and decode plaintext. The first was a substitution cipher that converted plaintext characters to midfix syllables that are then embedded within meaningless prefixes and suffixes using the method described above. The second encoding technique assigned a number to each plaintext character and then used these numbers to specify the placement of the Cardan grille on the table. Both techniques, however, produce scripts with much less repetition of words than Voynichese. This finding indicates that if the Cardan grille was indeed used to make the Voynich manuscript, the author was probably creating cleverly designed nonsense rather than a ciphertext. I found no evidence that the manuscript contains a coded message.
This absence of evidence does not prove that the manuscript was a hoax, but my work shows that the construction of a hoax as complex as the Voynich manuscript was indeed feasible. This explanation dovetails with several intriguing historical facts: Elizabethan scholar John Dee and his disreputable associate Edward Kelley visited the court of Rudolf II during the 1580s. Kelley was a notorious forger, mystic and alchemist who was familiar with Cardan grilles. Some experts on the Voynich manuscript have long suspected that Kelley was the author.
My undergraduate student Laura Aylward is currently investigating whether more complex statistical features of the manuscript can be reproduced using the Cardan grille technique. Answering this question will require producing large amounts of text using different table and grille layouts, so we are writing software to automate the method.
This study yielded valuable insights into the process of reexamining difficult problems to determine whether any possible solutions have been overlooked. A good example of such a problem is the question of what causes Alzheimer's disease. We plan to examine whether our approach could be used to reevaluate previous research into this brain disorder. Our questions will include: Have the investigators neglected any field of relevant expertise? Have the key assumptions been tested sufficiently? And are there subtle misunderstandings between the different disciplines that are involved in this work? If we can use this process to help Alzheimer's researchers find promising new directions, then a medieval manuscript that looks like an alchemist's handbook may actually prove to be a boon to modern medicine.



GORDON RUGG became interested in the Voynich manuscript about four years ago. At first he viewed it as merely an intriguing puzzle, but later he saw it as a test case for reexamining complex problems. He earned his Ph.D. in psychology at the University of Reading in 1987. Now a senior lecturer in the School of Computing and Mathematics at Keele University in England, Rugg is editor in chief of Expert Systems: The International Journal of Knowledge Engineering and Neural Networks. His research interests include the nature of expertise and the modeling of information, knowledge and beliefs.

No comments: