Seifert Christin, Bailer Werner, Orgel Thomas, Gantner Louis, Kern Roman, Ziak Hermann, Petit Albin, Schlötterer Jörg, Zwicklbauer Stefan, Granitzer Michael

Ubiquitous Access to Digital Cultural Heritage

Journal on Computing and Cultural Heritage (JOCCH) - Special Issue on Digital Infrastructure for Cultural Heritage, Part 1, Roberto Scopign, ACM, New York, NY, US, 2017

The digitization initiatives in the past decades have led to a tremendous increase in digitized objects in the cultural heritagedomain. Although digitally available, these objects are often not easily accessible for interested users because of the distributedallocation of the content in different repositories and the variety in data structure and standards. When users search for culturalcontent, they first need to identify the specific repository and then need to know how to search within this platform (e.g., usageof specific vocabulary). The goal of the EEXCESS project is to design and implement an infrastructure that enables ubiquitousaccess to digital cultural heritage content. Cultural content should be made available in the channels that users habituallyvisit and be tailored to their current context without the need to manually search multiple portals or content repositories. Torealize this goal, open-source software components and services have been developed that can either be used as an integratedinfrastructure or as modular components suitable to be integrated in other products and services. The EEXCESS modules andcomponents comprise (i) Web-based context detection, (ii) information retrieval-based, federated content aggregation, (iii) meta-data definition and mapping, and (iv) a component responsible for privacy preservation. Various applications have been realizedbased on these components that bring cultural content to the user in content consumption and content creation scenarios. Forexample, content consumption is realized by a browser extension generating automatic search queries from the current pagecontext and the focus paragraph and presenting related results aggregated from different data providers. A Google Docs add-onallows retrieval of relevant content aggregated from multiple data providers while collaboratively writing a document. Theserelevant resources then can be included in the current document either as citation, an image, or a link (with preview) withouthaving to leave disrupt the current writing task for an explicit search in various content providers’ portals.

Granitzer MIchael, Veas Eduardo Enrique, Seifert C.

Linked Data Query Wizard: A Novel Interface for Accessing SPARQL Endpoints.

LDOW, 2014

In an interconnected world, Linked Data is more importantthan ever before. However, it is still quite di cult to accessthis new wealth of semantic data directly without havingin-depth knowledge about SPARQL and related semantictechnologies. Also, most people are currently used to consumingdata as 2-dimensional tables. Linked Data is by de -nition always a graph, and not that many people are used tohandle data in graph structures. Therefore we present theLinked Data Query Wizard, a web-based tool for displaying,accessing, ltering, exploring, and navigating Linked Datastored in SPARQL endpoints. The main innovation of theinterface is that it turns the graph structure of Linked Datainto a tabular interface and provides easy-to-use interactionpossibilities by using metaphors and techniques from currentsearch engines and spreadsheet applications that regular webusers are already familiar with.

Stegmaier Florian, Seifert Christin, Kern Roman, Höfler Patrick, Bayerl Sebastian, Granitzer Michael, Kosch Harald, Lindstaedt Stefanie , Mutlu Belgin, Sabol Vedran, Schlegel Kai

Unleashing semantics of research data

Specifying Big Data Benchmarks, Springer, Berlin, Heidelberg, 2014

Research depends to a large degree on the availability and quality of primary research data, i.e., data generated through experiments and evaluations. While the Web in general and Linked Data in particular provide a platform and the necessary technologies for sharing, managing and utilizing research data, an ecosystem supporting those tasks is still missing. The vision of the CODE project is the establishment of a sophisticated ecosystem for Linked Data. Here, the extraction of knowledge encapsulated in scientific research paper along with its public release as Linked Data serves as the major use case. Further, Visual Analytics approaches empower end users to analyse, integrate and organize data. During these tasks, specific Big Data issues are present.

Seifert Christin, Ulbrich Eva Pauline, Granitzer Michael

Word Clouds for Efficient Document Labeling

The Fourteenth International Conference on Discovery Science (DS 2011), Lecture Notes in Computer Science, Springer, 2011

In text classification the amount and quality of training datais crucial for the performance of the classifier. The generation of trainingdata is done by human labelers - a tedious and time-consuming work. Wepropose to use condensed representations of text documents instead ofthe full-text document to reduce the labeling time for single documents.These condensed representations are key sentences and key phrases andcan be generated in a fully unsupervised way. The key phrases are presentedin a layout similar to a tag cloud. In a user study with 37 participantswe evaluated whether document labeling with these condensedrepresentations can be done faster and equally accurate by the humanlabelers. Our evaluation shows that the users labeled word clouds twiceas fast but as accurately as full-text documents. While further investigationsfor different classification tasks are necessary, this insight couldpotentially reduce costs for the labeling process of text documents.

Beham Günter, Lindstaedt Stefanie , Ley Tobias, Kump Barbara, Seifert C.

MyExperiences: Visualizing Evidence in an Open Learner Model

Adjunct Proceedings of the 18th Conference on User Modeling, Adaptation, and Personaization, Posters and Demonstrations, Bohnert, B., Quiroga, L. M., 2010

When inferring a user’s knowledge state from naturally occurringinteractions in adaptive learning systems, one has to makes complexassumptions that may be hard to understand for users. We suggestMyExperiences, an open learner model designed for these specificrequirements. MyExperiences is based on some of the key design principles ofinformation visualization to help users understand the complex information inthe learner model. It further allows users to edit their learner models in order toimprove the accuracy of the information represented there.

Lex Elisabeth, Granitzer Michael, Juffinger A., Seifert C.

Cross-Domain Classification: Trade-Off between Complexity and Accuracy

Proceedings of the 4th International Conference for Internet Technology and Secured Transactions (ICITST) 2009, 2009

Text classification is one of the core applications in data mining due to the huge amount of not categorized digital data available. Training a text classifier generates a model that reflects the characteristics of the domain. However, if no training data is available, labeled data from a related but different domain might be exploited to perform crossdomain classification. In our work, we aim to accurately classify unlabeled blogs into commonly agreed newspaper categories using labeled data from the news domain. The labeled news and the unlabeled blog corpus are highly dynamic and hourly growing with a topic drift, so a trade-off between accuracy and performance is required. Our approach is to apply a fast novel centroid-based algorithm, the Class-Feature-Centroid Classifier (CFC), to perform efficient cross-domain classification. Experiments showed that this algorithm achieves a comparable accuracy than k-NN and is slightly better than Support Vector Machines (SVM), yet at linear time cost for training and classification. The benefit of this approach is that the linear time complexity enables us to efficiently generate an accurate classifier, reflecting the topic drift, several times per day on a huge dataset.

Granitzer Michael, Rath Andreas S., Kröll Mark, Ipsmiller D., Devaurs Didier, Weber Nicolas, Lindstaedt Stefanie , Seifert C.

Machine Learning based Work Task Classification

Journal of Digital Information Management, 2009

Increasing the productivity of a knowledgeworker via intelligent applications requires the identification ofa user’s current work task, i.e. the current work context a userresides in. In this work we present and evaluate machine learningbased work task detection methods. By viewing a work taskas sequence of digital interaction patterns of mouse clicks andkey strokes, we present (i) a methodology for recording thoseuser interactions and (ii) an in-depth analysis of supervised classificationmodels for classifying work tasks in two different scenarios:a task centric scenario and a user centric scenario. Weanalyze different supervised classification models, feature typesand feature selection methods on a laboratory as well as a realworld data set. Results show satisfiable accuracy and high useracceptance by using relatively simple types of features.

Lex Elisabeth, Kienreich Wolfgang, Granitzer Michael, Seifert C.

A generic framework for visualizing the news article domain and its application to real-world data

Journal of Digital Information Management, 2008

