Clustering a medieval social network by SOM using a kernel based distance measure
By Nathalie Villa and Romain Boulet
Proceedings of ESANN 2007 (Bruges, 2007)
Introduction: Social networks have been intensively studied through graphs during the past years: examples of such studies are given in for World Wide Web, scientifc networks or P2P networks. Most of these graphs come from modern social networks whereas we propose here to analyse the social organization of a medieval peasant community before the Hundred Years’ War. This social network has been built from a corpus of agrarian contracts.
A first study investigates this problem by the use of the algebraic properties of a non-weighted graph. We propose here a new approach, using an automatic neuronal method and more precisely an adaptation of the Kohonen Self Organizing Map (SOM) on data described by a dissimilarity matrix. The SOM algorithm, first introduced by Kohonen, is an unsupervised method which allows both clustering and visualization. The original data, usually living in a high dimensional space, are projected non linearly in a low dimensional space (generally, the projection dimension is set to 1 or 2) called a map; they are partionned into several clusters while preserving their initial topology. This algorithm has recently been adapted to non-vectorial data; we focus here on the adaptation proposed in; a variant of this Dissimilarity SOM (or median SOM) has been introduced and used for Web Usage Mining in and a faster version is then described in. The algorithm that we propose is the one described in but we are using a distance defined on a weighted graph by the diffusion kernel.
The paper is organized as follows. In section 2, we recall the Dissimilarity SOM algorithm (section 2.1) and describe how distances based on a kernel can be used to produce an unsupervised classification algorithm for weighted graphs (section 2.2). In section 3, we focus on the medieval data set: after describing it (section 3.1), we explain how we apply our method e±ciently and how we construct a final classification (section 3.2). Finally, in section 3.3, we compare this classification with algebraic or historical prior knowledge: some similarities prove that the results are consistent with previous work.
The graph on which we tested our approach has been obtained from a data base of approximately 10 000 agrarian contracts from four seignories of the Lot and the Tarn-et-Garonne (South West of France). These contracts were established between 1240 and 1520. Historians are mainly concerned with the analysis of country sociability during the Middle-Ages but the data base is too large for an exhaustive study so that data mining tools are required.
Here we focus on a part of this database, based at the Castelnau-Montratier seignory (Lot) between 1240 and 1350 (before the Hundred Years’ War). Based on this data base, we constructed a weighted graph having 226 vertices (the peasants) which are linked together if they appeared in the same contract. The weights were simply the number of common contracts in which two peasants appeared together. We cleaned the graph by deleting the nobilities because they were mentionned in almost every contract (as the legal authorities).