Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen


Kern Roman, Frey Matthias

Efficient Table Annotation for Digital Articles

4th International Workshop on Mining Scientific Publications, D-Lib, 2015

Table recognition and table extraction are important tasks in information extraction, especially in the domain of schol- arly communication. In this domain tables are commonplace and contain valuable information. Many different automatic approaches for table recognition and extraction exist. Com- mon to many of these approaches is the need for ground truth datasets, to train algorithms or to evaluate the results. In this paper we present the PDF Table Annotator, a web based tool for annotating elements and regions in PDF doc- uments, in particular tables. The annotated data is intended to serve as a ground truth useful to machine learning algo- rithms for detecting table regions and table structure. To make the task of manual table annotation as convenient as possible, the tool is designed to allow an efficient annotation process that may spawn multiple session by multiple users. An evaluation is conducted where we compare our tool to three alternative ways of creating ground truth of tables in documents. Here we found that our tool overall provides an efficient and convenient way to annotate tables. In addition, our tool is particularly suitable for complex table structures, where it provided the lowest annotation time and the highest accuracy. Furthermore, our tool allows to annotate tables following a logical or a functional model. Given that by the use of our tool ground truth datasets for table recognition and extraction are easier to produce, the quality of auto- matic tables extraction should greatly benefit. General
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.