The Ground Truth Guidelines

The OCR-D-Ground-Truth-Guidelines provide instructions for ground truth creation. The ground truth can thus be technically validated and existing transcriptions can be checked on this basis and, if necessary, converted into Ground Truth data.

The data format of the OCR-D-Ground-Truth is PAGE-XML. This format was initially developed by the PRImA Research Lab at the University of Salford Greater Manchester and fundamentally extended within the EU project IMPACT. It is currently managed by the PRImA Research Lab. In order to ensure further development and maintenance of this format, a PAGE-XML board was created on the initiative of OCR-D.