Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen


Seifert Christin, Bailer Werner, Orgel Thomas, Gantner Louis, Kern Roman, Ziak Hermann, Petit Albin, Schlötterer Jörg, Zwicklbauer Stefan, Granitzer Michael

Ubiquitous Access to Digital Cultural Heritage

Journal on Computing and Cultural Heritage (JOCCH) - Special Issue on Digital Infrastructure for Cultural Heritage, Part 1, Roberto Scopign, ACM, New York, NY, US, 2017

The digitization initiatives in the past decades have led to a tremendous increase in digitized objects in the cultural heritagedomain. Although digitally available, these objects are often not easily accessible for interested users because of the distributedallocation of the content in different repositories and the variety in data structure and standards. When users search for culturalcontent, they first need to identify the specific repository and then need to know how to search within this platform (e.g., usageof specific vocabulary). The goal of the EEXCESS project is to design and implement an infrastructure that enables ubiquitousaccess to digital cultural heritage content. Cultural content should be made available in the channels that users habituallyvisit and be tailored to their current context without the need to manually search multiple portals or content repositories. Torealize this goal, open-source software components and services have been developed that can either be used as an integratedinfrastructure or as modular components suitable to be integrated in other products and services. The EEXCESS modules andcomponents comprise (i) Web-based context detection, (ii) information retrieval-based, federated content aggregation, (iii) meta-data definition and mapping, and (iv) a component responsible for privacy preservation. Various applications have been realizedbased on these components that bring cultural content to the user in content consumption and content creation scenarios. Forexample, content consumption is realized by a browser extension generating automatic search queries from the current pagecontext and the focus paragraph and presenting related results aggregated from different data providers. A Google Docs add-onallows retrieval of relevant content aggregated from multiple data providers while collaboratively writing a document. Theserelevant resources then can be included in the current document either as citation, an image, or a link (with preview) withouthaving to leave disrupt the current writing task for an explicit search in various content providers’ portals.

Stegmaier Florian, Seifert Christin, Kern Roman, Höfler Patrick, Bayerl Sebastian, Granitzer Michael, Kosch Harald, Lindstaedt Stefanie , Mutlu Belgin, Sabol Vedran, Schlegel Kai

Unleashing semantics of research data

Specifying Big Data Benchmarks, Springer, Berlin, Heidelberg, 2014

Research depends to a large degree on the availability and quality of primary research data, i.e., data generated through experiments and evaluations. While the Web in general and Linked Data in particular provide a platform and the necessary technologies for sharing, managing and utilizing research data, an ecosystem supporting those tasks is still missing. The vision of the CODE project is the establishment of a sophisticated ecosystem for Linked Data. Here, the extraction of knowledge encapsulated in scientific research paper along with its public release as Linked Data serves as the major use case. Further, Visual Analytics approaches empower end users to analyse, integrate and organize data. During these tasks, specific Big Data issues are present.

Granitzer Michael, Kienreich Wolfgang, Seifert Christin

Visualizing Text Classification Models with Voronoi Word Clouds

Proceedings 15th International Conference Information Visualisation (IV), 2011


Seifert Christin, Ulbrich Eva Pauline, Granitzer Michael

Word Clouds for Efficient Document Labeling

The Fourteenth International Conference on Discovery Science (DS 2011), Lecture Notes in Computer Science, Springer, 2011

In text classification the amount and quality of training datais crucial for the performance of the classifier. The generation of trainingdata is done by human labelers - a tedious and time-consuming work. Wepropose to use condensed representations of text documents instead ofthe full-text document to reduce the labeling time for single documents.These condensed representations are key sentences and key phrases andcan be generated in a fully unsupervised way. The key phrases are presentedin a layout similar to a tag cloud. In a user study with 37 participantswe evaluated whether document labeling with these condensedrepresentations can be done faster and equally accurate by the humanlabelers. Our evaluation shows that the users labeled word clouds twiceas fast but as accurately as full-text documents. While further investigationsfor different classification tasks are necessary, this insight couldpotentially reduce costs for the labeling process of text documents.

Kern Roman, Seifert Christin, Zechner Mario, Granitzer Michael

Vote/Veto Meta-Classifier for Authorship Identification

CLEF 2011: Proceedings of the 2011 Conference on Multilingual and Multimodal Information Access Evaluation (Lab and Workshop Notebook Papers), Amsterdam, The Netherlands, 2011

For the PAN 2011 authorship identification challenge we have developeda system based on a meta-classifier which selectively uses the results ofmultiple base classifiers. In addition we also performed feature engineering basedon the given domain of e-mails. We present our system as well as results on theevaluation dataset. Our system performed second and third best in the authorshipattribution task on the large data sets, and ranked middle for the small data set inthe attribution task and in the verification task.

Kern Roman, Seifert Christin, Granitzer Michael

A Hybrid System for German Encyclopedia Alignment

International Journal on Digital Libraries, Springer, 2010

Collaboratively created on-line encyclopediashave become increasingly popular. Especially in terms ofcompleteness they have begun to surpass their printedcounterparts. Two German publishers of traditional encyclopediashave reacted to this challenge and started aninitiative to merge their corpora to create a single, more completeencyclopedia. The crucial step in this merging processis the alignment of articles. We have developed a two-stephybrid system to provide high-accurate alignments with lowmanual effort. First, we apply an information retrieval based,automatic alignment algorithm. Second, the articles with alow confidence score are revised using a manual alignmentscheme carefully designed for quality assurance. Our evaluationshows that a combination of weighting and rankingtechniques utilizing different facets of the encyclopedia articlesallow to effectively reduce the number of necessary manualalignments. Further, the setup of the manual alignment turned out to be robust against inter-indexer inconsistencies.As a result, the developed system empowered us to align fourencyclopedias with high accuracy and low effort.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.