Data Management Plan

The IATEXT Data Management Plan (DMP) outlines the general guidelines for data management across all its projects and applications. This plan covers the strategies and procedures for the comprehensive management of data—primarily textual—within a Digital Humanities project. It focuses on data collection, storage, processing, access, and preservation, while allowing flexibility for the specific needs of each project or research work. For the implementation and development of the more technical aspects of this DMP, IATEXT relies on its Computational Linguistics and Software Applications Division.

Data Identification

Each research project must define the origin of the data, the types and formats of the documents containing it, and the processes required for digitisation, if necessary. As a general rule, the data will be textual (words, sentences, paragraphs, textual fragments, or complete documents, depending on the case). However, depending on the characteristics of each study, data may also include images, geolocation data, audio, etc.

In all cases, a relational database is designed to meet the current and future needs of each research project. This database is adapted, depending on data volume, to collection requirements and the technological skills of the researchers. If necessary, data stored in a relational database can be easily exported to other formats such as XML, TEI, CSV, etc., preferably open formats, in order to facilitate dissemination and preservation in the face of future technological changes. Exceptionally, data may be stored in other formats, provided they allow computational processing and are justified for technical, computational, or compatibility reasons with external systems or organisations.

Data Organisation and Management

Projects developed by IATEXT consist of at least two types of applications created specifically for each project. Both applications use the same database designed for that purpose:

  • A web-based annotation application (data preparation/curation): in this environment, researchers classify and manage data in a controlled and secure way. Access is granted via username and password (stored in encrypted form). There are two user roles: reviewer and researcher. Researchers are responsible for entering data and defining relationships, while reviewers—who have the same permissions—can also review and validate the work carried out by researchers. It is optional whether each researcher can manage and/or view only their own data or whether all researchers can access all data. In any case, the application maintains an internal log of user access and actions to detect irregularities.
  • A public, open-access web application for data consultation: this application provides access to the project database and only displays data that has been reviewed and validated by researchers with reviewer privileges. This ensures that data still under classification or not yet reviewed by a second researcher does not appear in search results. Search queries allow filtering by metadata or project-specific features, enabling users to obtain different “views” of the data according to their needs. Results can be downloaded in open formats. The data displayed in the consultation application is not version-controlled, as it directly accesses the project’s single database without data replication. As researchers progress, new validated data becomes available in real time to the scientific community and the general public. However, the research team may “freeze” specific versions of the dataset when necessary.

Data Documentation

Data is documented, classified, and tagged through the annotation web application, developed ad hoc for each IATEXT project. The consultation web application includes a descriptive section about the project and its data, as well as a help section explaining how to use the platform. Researchers wishing to reuse the data for further studies must contact the project’s principal investigator and request the data in the required format, in accordance with the applicable data policy, ethics, and licensing conditions.

Data Quality

The annotation web application is the sole tool used by researchers to manage project data. It enforces controlled input (whenever possible) through predefined selection lists, reducing human error and avoiding inconsistencies in classification labels. The system ensures data coherence and robustness through structured storage in the project-specific database. It also requires researchers to complete all mandatory fields. Researchers with reviewer privileges are responsible for validating and approving the data entered by others before it becomes publicly available. If errors are detected, corrections made through the annotation application are immediately reflected in the consultation application.

Storage Strategy

Data is stored in relational databases designed specifically for each project. Both the applications and databases are hosted on servers owned by IATEXT. These servers operate 24/7, ensuring data preservation during the research period and long-term availability after project completion. Daily backups of all databases are performed on a secondary server also owned by IATEXT. Additionally, full server backups are carried out quarterly using a NAS system. Applications are version-controlled and stored in a cloud repository throughout their lifecycle.

Data Policy, Ethics, and Licensing

Each project must define its own data policy regarding usage licenses and the handling of sensitive data, if applicable. In all cases, responsibility lies with the project’s principal investigator, and control is exercised through the consultation web application.

Data Dissemination

Data is disseminated automatically and immediately through the consultation web application once it has been validated. Query results can be downloaded in PDF format or other formats commonly used in Digital Humanities.

Roles and Responsibilities

The principal investigator is responsible for ensuring compliance with the Data Management Plan, in coordination with the rest of the research team.

Budget

The cost of data storage and preservation is covered by IATEXT and is free of charge for its own research projects.