OCR-D

Coordinated further development of processes for Optical Character Recognition (OCR-D)

Info

Applicant:
  • PD Dr. Alexander Geyken, since 12/2015
  • Professor Dr. Martin Grötschel, since 12/2015
  • Professor Dr. Günter Stock, to 12/2015
  • Professor Dr. Helwig Schmidt-Glintzer, to 8/2016
  • Dr. Klaus Ceynowa, to 11/2016
  • Professor Dr. Peter Burschel, since 8/2016
  • Barbara Schneider-Kempf, since 11/2016
  • Dr. Rainer Stotzka, to 2020
Subject Area: Datenmanagement, OCR, Digitalisierung
Term: 2015 to 2024
Project identifier: Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 274863866; 409784275; 460675868
Institution:
  • Bavarian State Library Munich
  • Berlin-Brandenburg Academy of Sciences and Humanities in Berlin (BBAW)
  • Berlin State Library - Prussian Cultural
  • Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen
  • Göttingen State and University Library
  • Herzog August Library Wolfenbüttel
  • Karlsruhe Institute of Technology/Steinbuch Centre for Computing

Description

The main objective of the OCR-D project is the conceptual and technical preparation of the full text transformation of the VD. The task of automatic full-text recognition is broken down into its individual process steps, which can be reproduced in the open source OCR-D software. This makes it possible to create optimal workflows for the old prints to be processed and thus generate research-usable full texts.

The OCR-D project was carried out in three project phases: Concept phase, module phase and stabilization phase. In the first phase, the development needs were identified, which were then processed in the second phase by eight module projects under the guidance of the coordination project. In the stabilization phase from April 2021, the results of the module projects were consolidated and transferred to productive use. The practical and conceptual preparation for the automatic generation of full texts for the indexes of 16th, 17th and 18th century prints published in the German-speaking world is to be realized.

The coordination project provided:

  • long-term support and further development of the OCR-D software,
  • the results of the implementation work, which produced stable, ready-to-use OCR-D solutions for mass digitization,
  • the dissemination and availability of the software for a wide range of users.

Examples

OCR-D Function model and OCR-D GT-Guidelines