The Spanish Morphology Database as an aid to the preparation of the New Historical Dictionary of Spanish

Signatories: Royal Spanish Academy, University of Santiago de Compostela and University of Las Palmas de Gran Canaria

Researcher in charge of the USC: Dr. Jesús Pena Seijas

Researcher in charge of the ULPGC: Dr. Francisco Javier Carreras Riudavets

Funding entity: Royal Spanish Academy

Start date: 06/01/2017

End date: 06/01/2019

Total amount: 21,600 Euros


The MORFOGEN TIP application is a visualization tool that allows us to represent, either in the form of graphs —close to tree diagrams—, or in linear form, the lexical families and subfamilies of Spanish. The application transfers to the visual representation a part of the morpho-etymological information stored in a Spanish Morphological Database (BDME). By showing family relationships in the form of a family tree, it is intended to overcome the inconvenience of printed works, where the ordering of lexical families is conditioned by their linear structure.

The BDME was designed and organized by Jesús Pena (USC) in the mid-1980s. It underwent reforms, extensions and corrections between 2009 and 2016, with two objectives: a) to provide support to the New Historical Dictionary of Spanish (NDHE) of the RAE in terms of its morphological and genetic configuration; b) provide specialists in morphology and lexicon with more precise information on derivative relations and on lexical families. The re-elaboration of the BDME was carried out under two state projects (MICINN FFI2008-03532 and MINECOFFI2012-38550) and another regional project (Xunta de Galicia10PXIB204249PR).

In the morphological database, privileged status was given to Latin as the mother tongue of the Romance languages. The more than 30,000 Latin words analysed, in relation to the 50,000 in Spanish, make it possible to contrast the Latin derivation series with the series inherited by Spanish. Thanks to this comparison, the readjustments experienced in this language as a consequence of the gaps and dislocations with respect to the Latin series are observed, and the patterns of word formation in the Latin (mother) and Spanish (daughter) languages ​​are exhaustively confronted. The analysis of more than 5,000 Greek words provides the origin of both the common lexicon and the scientific-technical vocabulary.

The design of the morphological database allows the analysis to be extended to other languages ​​in the future. The Romanesque vocabulary of Latin and Greek origin that is incorporated will already have a large lexicon analyzed in these two languages, and the derivation series in which the new analyzed words will be integrated will also be configured. Since its inception, the database was conceived to include not only Spanish, but also the other Romance languages, which forced the prior analysis of many of the lexical families of Greek and, above all, of Latin.

Due to the link between the BDME-MORFOGEN and the NDHE, directed by José Antonio Pascual, the Royal Spanish Academy has signed a collaboration agreement (RAE-USC-ULPG, 2017/2019). On behalf of the USC, professors Jesús Pena and María José Rodríguez Espiñeira coordinate the linguistic part of the project and, on behalf of the IATEXT Computational Linguistics Division, professor Francisco Javier Carreras Riudavets coordinates the computer part.