Matthias Boenig

image.png

For over twenty years I have been involved in the digitization of library collections, scientific publications and research data for science. I have worked on a wide range of projects. My work includes the development and improvement of electronic publishing and the use of XML for digital editions and research data. I have also further developed OCR technology and supported its application in various areas of the humanities and the archiving of historical documents. A few projects should be highlighted.

OCR-D (2015-2024): In this project I worked on the standardization and cataloging of training material for automatic text and structure recognition. The major goal of this project is the complete digitization of historical prints from the 16th century to the 18th century and thus to enable their full text transformation, which represents a significant contribution to the preservation and accessibility of cultural assets.

German Text Archive (DTA) (2010-2017): Today, the DTA offers an extensive collection of German-language texts that serve as the basis for a reference corpus of Modern High German. It comprises around 1500 titles and is characterized by a balanced selection of texts and the use of first editions for digitization. In the DTA project, I worked on the procedural and technological implementation of full-text digitization.

AEDIT (2012-2015): AEDIT is a prototype archive, edition and distribution platform for early modern works. In this repository, data stocks from digitization and edition projects are to be catalogued, disseminated and made available in the long term. In this project, I have digitized a corpus of 335 funeral sermons together with the Research Centre for Personal Writings at the Philipps University of Marburg (working group of the Academy of Sciences and Literature, Mainz). As part of the project, the basic DTA format was updated for this type of text.

ProPrint (2000-2003): ProPrint is a prototypical print-on-demand service provider. It is technologically based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). ProPrint makes it possible to link document and publication servers and offers the option of ordering user-selected publications as print-on-demand publications.

Dissonline (1997-2000): This project specializes in the online publication of dissertations and habilitations. It offers a platform on which scientific work can be made digitally accessible, which considerably facilitates the dissemination of and access to research results.

selected publications

  1. OCR-D für die Massendigitalisierung: Projektstand und Ausblick
    Lena Hinrichsen, Konstantin Baierer, Clemens Neudecker, and 2 more authors
    2023
  2. Dokument, Transkription, Forschungsdatum
    Konstantin Baierer, Matthias Boenig, Elisabeth Engl, and 5 more authors
    2022
  3. Das DTABf in der Edition: zusammenfassender Evaluationsbericht
    Bernhard Fisseni, Simon Sendler, Daniela Schulz, and 3 more authors
    2021
  4. Volltexte–die Zukunft alter Drucke: Bericht zum Abschlussworkshop des OCR-D-Projekts
    Elisabeth Engl, Konstantin Baierer, Matthias Boenig, and 2 more authors
    o-bib. Das offene Bibliotheksjournal/Herausgeber VDB, 2020
  5. OCR-D: An end-to-end open source OCR framework for historical printed documents
    Clemens Neudecker, Konstantin Baierer, Maria Federbusch, and 4 more authors
    2019
  6. Ground Truth: Grundwahrheit oder Ad-Hoc-Lösung? Wo stehen die Digital Humanities?
    Matthias Boenig, Maria Federbusch, Elisa Herrmann, and 2 more authors
    2018
  7. Über den Mehrwert der Vernetzung von OCR-Verfahren zur Erfassung von Texten des 17. Jahrhunderts.
    Matthias Boenig, Kay-Michael Würzner, Arne Binder, and 1 more author
    2016
  8. Zeitliche Verlaufskurven in den DTA-und DWDS-Korpora: Wörter und Wortverbindungen über 400 Jahre (1600-2000).
    Alexander Geyken, Matthias Boenig, Susanne Haaf, and 4 more authors
    2015
  9. Standardized Information on historical Proper Names in Digital Full Text Transcriptions. Crowdsourcing ref= s for< placeName> and< persName> tags in the corpora of the German Text Archive/Deutsches Textarchiv*
    Christian Thomas, Matthias Boenig, Alexander Geyken, and 5 more authors
    2015
  10. Mehr als schmutzige OCR’: die Aufwertungen von historischen Volltextdigitalisaten zu Forschungsdaten
    Matthias Boenig, and Alexander Geyken
    2015
  11. Historical newspapers & journals for the DTA
    Susanne Haaf, and Matthias Schulz
    Proceedings of the LREC Workshop on Language Resources and Technologies for Processing and Linking Historical Documents and Archives—Deploying Linked Open Data in Cultural Heritage (LRT4HDA), 2014