Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis

Voynich_manuscript2Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis

By Marcelo A. Montemurro and Damián H. Zanette

PLoS ONE, Vol.8:6 (2013)

Abstract: The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the text written on medieval parchment – using an unknown script system- shows basic statistical patterns that bear resemblance to those from real languages, there are features that suggested to some researches that the manuscript was a forgery intended as a hoax. Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book.

Introduction: The Voynich manuscript–named after the Polish-American antiquarian Wilfrid Voynich, who owned it since 1912 until his death in 1930-is perhaps the most widely known example of a book written in an as yet undeciphered script. Its author and language are unknown, and no other document in the same script has ever been found. The manuscript‚s ownership history can be traced back to the seventeenth century, but carbon dating of its vellum and stylistic analysis of its illustrations suggest that it was written around the second half of the fifteenth century (Dr. Greg Hodgins, University of Arizona, personal communication). Presently, the book belongs to the Beinecke Rare Book and Manuscript Library of Yale University, where it is identified as Beinecke MS 408. Public-domain electronic images of the full manuscript are deposited in Wikimedia Commons.

The manuscript comprises 104 folios, organized into 18 quires bound to leather thongs. Both sides of most folios contain text, written from left to right. The text consists of discrete graphemes, chosen from an “alphabet” of some 40 symbols and organized into arrays or “words” of variable length. These arrays are separated by spaces, and lines are sometimes grouped into paragraphs but, otherwise, no evident punctuation marks are used. Most pages also contain illustrations, which modern scholars have used to “thematically” divide the manuscript into five sections: Herbal, Astrological, Biological, Pharmacological, and Recipes. The Herbal section is the longest, and displays dozens of ravishingly coloured plant drawings. Oddly enough, however, not a single one of these pictures could be unquestionably recognized as an existing plant. Similarly, except for the Zodiac signs in the Astrological section, no illustration could be unambiguously interpreted in the whole book.

Click here to read this article from Plos One

See also Mysterious Voynich manuscript has ‘genuine message’ from BBC News

See also What We Know About The Voynich Manuscript

Sign up to get a Weekly Email from

* indicates required

Sign up for our weekly email newsletter!