17_frak_complex

Metadata

Name:: 17_frak_complex
Description:: Ground Truth with a collection of documents with the following characteristics: font blackletter, musical notation, hand-written additions, with title page
Language:: deu, lat
Format:: Page-XML
Time:: 1600-1700
GT Type:: data_structure_and_text

More Information

Transcription Guidelines:: OCR-D Ground Truth Guidelines https://ocr-d.de/en/gt-guidelines/trans/
License:: CC0 1.0
Project:: OCR-D
Project-URL:: https://ocr-d.de/

Labelling

The GT data has been labeled. The labeling is based on an ontology defined by the Pattern Recognition and Image Analysis Research Lab (PRImA-Research-Lab) at the University of Salford. This normalized and semantic description of the OCR-GT data can be found in the METS metadata file. The labeling metadata is created for each available page. The following labeling metadata is available for the complete collection.

Here you will find a description and explanation of the labeling metadata.

activityDomain/computing/visual/analysisRecognition/layoutAnalysis

Description: In computer vision, document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order. Examples: Page layout analysis (segmentation into regions, classification into text, graphic, table etc.) Related: "OCR": Often used as a synonym for layout analysis and text recognition, but strictly only the text recognition component.

activityDomain/computing/visual/analysisRecognition/ocr

Description:

activityDomain/computing/visual/analysisRecognition/text

Description: Translation of any kind of depicted symbols to machine readable format Examples: OCR Mathematical equation recognition Related: Text processing (separate category) Table recognition Map reading

condition/acquisition/content-or-background/included-objects/preceeding-or-proceeding

Description: Part of preceeding or succeeding object included (e.g. other page)

condition/acquisition/geometric/page-curl

Description: Visible page curl (e.g. book scanning)

condition/acquisition/geometric/perspective-distortions

Description: Perspective distortions (e.g. due to camera-based acquisition)

condition/production-related/document-faults/ink-from-facing

Description: Ink from facing page was transferred to this page

condition/wear/additions/informative/annotations

Description: Annotations regarding the content

content-encoding/structured

Description: E.g. XML

content-type/corpus

Description: Corpus: a collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject. Examples: A text corpus, An image database

contentOfInterest/visual/composite/music

Description: Description coming soon.

contentOfInterest/visual/graphical

Description: Description coming soon.

contentOfInterest/visual/graphical/separator

Description: Description coming soon.

contentOfInterest/visual/text

Description: Description coming soon.

data-attributes/document-related/structural/running-titles

Description: Titles repeated each page

data-attributes/document-related/visual/text/drop-caps

Description: Drap capitals (large capitals at beginning of paragraph)

data-attributes/document-related/visual/text/font/multi-font/font-sizes

Description: More than one font size used

data-attributes/document-related/visual/text/font/multi-font/typefaces

Description: More than one typeface used

data-attributes/document-related/visual/text/font/typeface/antiqua

Description: Antiqua font (more modern)

data-attributes/document-related/visual/text/font/typeface/blackletter

Description: Blackletter, gothic, Fraktur

data-attributes/language/mixed

Description: More than one language used

granularity/logical/document-related/paragraph

Description: Description coming soon.

granularity/physical/document-related/page

Description: Description coming soon.

granularity/physical/document-related/region

Description: Region, zone, block

granularity/physical/document-related/text-line

Description: Description coming soon.

granularity/physical/document-related/word

Description: Word or partial word, if separated by line break, for example

platform/platform-independent

Description: Description coming soon.

Download

You can download the complete data here. They contain a zip file in which the components of the collection are also in zip files. Metadata for the complete collection and the components are in METS format.

Total view

💡 You can show and hide individual columns of the table.
Click the corresponding button.

Legend

TextLine: TextLine
Page: Page
TxtRegion: TextRegion
GraphRegion: GraphicRegion
SepRegion: SeperatorRegion
MusicRegion: MusicRegion
NoiseRegion: NoiseRegion

TextLine	Page	TxtRegion	GraphRegion	SepRegion	MusicRegion	NoiseRegion
215	8	63	4	11	4	2

Metadata

Labelling

Download

Total view

Details