By Sravana Reddy and Kevin Knight
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (2011)
Introduction: The Voynich manuscript, also referred to as the VMS, is an illustrated medieval folio written in an undeciphered script. There are several reasons why the study of the manuscript is of interest to the natural language processing community, besides its appeal as a long enduring unsolved mystery. Since even the basic structure of the text is unknown, it provides a perfect opportunity for the application of unsupervised learning algorithms. Furthermore, while the manuscript has been examined by various scholars, it has much to beneﬁt from attention by a community with the right tools and knowledge of linguistics, text analysis, and machine learning.
This paper presents a review of what is currently known about the VMS, as well as some original observations. Although the manuscript raises several questions about its origin, authorship, the illustrations, etc., we focus on the text through questions about its properties. These range from the level of the letter (for example, are there vowels and consonants?) to the page (do pages have topics?) to the document as a whole (are the pages in order?).