Matthias Boenig

For over twenty years I have been involved in the digitization of library collections, scientific publications and research data for science. I have worked on a wide range of projects. My work includes the development and improvement of electronic publishing and the use of XML for digital editions and research data. I have also further developed OCR technology and supported its application in various areas of the humanities and the archiving of historical documents. A few projects should be highlighted.
OCR-D (2015-2024): In this project I worked on the standardization and cataloging of training material for automatic text and structure recognition. The major goal of this project is the complete digitization of historical prints from the 16th century to the 18th century and thus to enable their full text transformation, which represents a significant contribution to the preservation and accessibility of cultural assets.
German Text Archive (DTA) (2010-2017): Today, the DTA offers an extensive collection of German-language texts that serve as the basis for a reference corpus of Modern High German. It comprises around 1500 titles and is characterized by a balanced selection of texts and the use of first editions for digitization. In the DTA project, I worked on the procedural and technological implementation of full-text digitization.
AEDIT (2012-2015): AEDIT is a prototype archive, edition and distribution platform for early modern works. In this repository, data stocks from digitization and edition projects are to be catalogued, disseminated and made available in the long term. In this project, I have digitized a corpus of 335 funeral sermons together with the Research Centre for Personal Writings at the Philipps University of Marburg (working group of the Academy of Sciences and Literature, Mainz). As part of the project, the basic DTA format was updated for this type of text.
ProPrint (2000-2003): ProPrint is a prototypical print-on-demand service provider. It is technologically based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). ProPrint makes it possible to link document and publication servers and offers the option of ordering user-selected publications as print-on-demand publications.
Dissonline (1997-2000): This project specializes in the online publication of dissertations and habilitations. It offers a platform on which scientific work can be made digitally accessible, which considerably facilitates the dissemination of and access to research results.
selected publications
- Dokument, Transkription, Forschungsdatum2022
- Über den Mehrwert der Vernetzung von OCR-Verfahren zur Erfassung von Texten des 17. Jahrhunderts.2016
- Zeitliche Verlaufskurven in den DTA-und DWDS-Korpora: Wörter und Wortverbindungen über 400 Jahre (1600-2000).2015
- Standardized Information on historical Proper Names in Digital Full Text Transcriptions. Crowdsourcing ref= s for< placeName> and< persName> tags in the corpora of the German Text Archive/Deutsches Textarchiv*2015