Seifert Christin, Bailer Werner, Orgel Thomas, Gantner Louis, Kern Roman, Ziak Hermann, Petit Albin, Schlötterer Jörg, Zwicklbauer Stefan, Granitzer Michael
2017
The digitization initiatives in the past decades have led to a tremendous increase in digitized objects in the cultural heritagedomain. Although digitally available, these objects are often not easily accessible for interested users because of the distributedallocation of the content in different repositories and the variety in data structure and standards. When users search for culturalcontent, they first need to identify the specific repository and then need to know how to search within this platform (e.g., usageof specific vocabulary). The goal of the EEXCESS project is to design and implement an infrastructure that enables ubiquitousaccess to digital cultural heritage content. Cultural content should be made available in the channels that users habituallyvisit and be tailored to their current context without the need to manually search multiple portals or content repositories. Torealize this goal, open-source software components and services have been developed that can either be used as an integratedinfrastructure or as modular components suitable to be integrated in other products and services. The EEXCESS modules andcomponents comprise (i) Web-based context detection, (ii) information retrieval-based, federated content aggregation, (iii) meta-data definition and mapping, and (iv) a component responsible for privacy preservation. Various applications have been realizedbased on these components that bring cultural content to the user in content consumption and content creation scenarios. Forexample, content consumption is realized by a browser extension generating automatic search queries from the current pagecontext and the focus paragraph and presenting related results aggregated from different data providers. A Google Docs add-onallows retrieval of relevant content aggregated from multiple data providers while collaboratively writing a document. Theserelevant resources then can be included in the current document either as citation, an image, or a link (with preview) withouthaving to leave disrupt the current writing task for an explicit search in various content providers’ portals.
Stegmaier Florian, Seifert Christin, Kern Roman, Höfler Patrick, Bayerl Sebastian, Granitzer Michael, Kosch Harald, Lindstaedt Stefanie , Mutlu Belgin, Sabol Vedran, Schlegel Kai
2014
Research depends to a large degree on the availability and quality of primary research data, i.e., data generated through experiments and evaluations. While the Web in general and Linked Data in particular provide a platform and the necessary technologies for sharing, managing and utilizing research data, an ecosystem supporting those tasks is still missing. The vision of the CODE project is the establishment of a sophisticated ecosystem for Linked Data. Here, the extraction of knowledge encapsulated in scientific research paper along with its public release as Linked Data serves as the major use case. Further, Visual Analytics approaches empower end users to analyse, integrate and organize data. During these tasks, specific Big Data issues are present.
Sabol Vedran, Albert Dietrich, Veas Eduardo Enrique, Mutlu Belgin, Granitzer Michael
2014
Linked Data has grown to become one of the largest availableknowledge bases. Unfortunately, this wealth of data remains inaccessi-ble to those without in-depth knowledge of semantic technologies. Wedescribe a toolchain enabling users without semantic technology back-ground to explore and visually analyse Linked Data. We demonstrateits applicability in scenarios involving data from the Linked Open DataCloud, and research data extracted from scientific publications. Our fo-cus is on the Web-based front-end consisting of querying and visuali-sation tools. The performed usability evaluations unveil mainly positiveresults confirming that the Query Wizard simplifies searching, refiningand transforming Linked Data and, in particular, that people using theVisualisation Wizard quickly learn to perform interactive analysis taskson the resulting Linked Data sets. In making Linked Data analysis ef-fectively accessible to the general public, our tool has been integratedin a number of live services where people use it to analyse, discover anddiscuss facts with Linked Data.
Mutlu Belgin, Tschinkel Gerwald, Veas Eduardo Enrique, Sabol Vedran, Stegmaier Florian, Granitzer Michael
2014
Research papers are published in various digital libraries, which deploy their own meta-models and tech-nologies to manage, query, and analyze scientific facts therein. Commonly they only consider the meta-dataprovided with each article, but not the contents. Hence, reaching into the contents of publications is inherentlya tedious task. On top of that, scientific data within publications are hardcoded in a fixed format (e.g. tables).So, even if one manages to get a glimpse of the data published in digital libraries, it is close to impossibleto carry out any analysis on them other than what was intended by the authors. More effective querying andanalysis methods are required to better understand scientific facts. In this paper, we present the web-basedCODE Visualisation Wizard, which provides visual analysis of scientific facts with emphasis on automatingthe visualisation process, and present an experiment of its application. We also present the entire analyticalprocess and the corresponding tool chain, including components for extraction of scientific data from publica-tions, an easy to use user interface for querying RDF knowledge bases, and a tool for semantic annotation ofscientific data set
Höfler Patrick, Granitzer Michael, Sabol Vedran, Lindstaedt Stefanie
2013
Linked Data has become an essential part of the Semantic Web. A lot of Linked Data is already available in the Linked Open Data cloud, which keeps growing due to an influx of new data from research and open government activities. However, it is still quite difficult to access this wealth of semantically enriched data directly without having in-depth knowledge about SPARQL and related semantic technologies. In this paper, we present the Linked Data Query Wizard, a prototype that provides a Linked Data interface for non-expert users, focusing on keyword search as an entry point and a tabular interface providing simple functionality for filtering and exploration.
Shahzad Syed K, Granitzer Michael, Helic Denis
2011
Ontology and Semantic Framework has becomepervasive in computer science. It has huge impact at database,business logic and user interface for a range of computerapplications. This framework is also being introduced, presentedor plugged at user interfaces for various software and websites.However, establishment of structured and standardizedontological model based user interface development environmentis still a challenge. This paper talks about the necessity of such anenvironment based on User Interface Ontology (UIO). To explorethis phenomenon, this research focuses at the User Interfaceentities, their semantics, uses and relationships among them. Thefirst part focuses on the development of User Interface Ontology.In the second step, this ontology is mapped to the domainontology to construct a User Interface Model. Finally, theresulting model is quantified and instantiated for a user interfacedevelopment to support our framework. This UIO is anextendable framework that allows defining new sub-conceptswith their ontological relationships and constraints.
Horn Christopher, Pimas Oliver, Granitzer Michael, Lex Elisabeth
2011
In this paper, we outline our experiments carried out at theTREC Microblog Track 2011. Our system is based on a plain text indexextracted from Tweets crawled from twitter.com. This index hasbeen used to retrieve candidate Tweets for the given topics. The resultingTweets were post-processed and then analyzed using three differentapproaches: (i) a burst detection approach, (ii) a hashtag analysis, and(iii) a Retweet analysis. Our experiments consisted of four runs: Firstly,a combination of the Lucene ranking with the burst detection, and secondly,a combination of the Lucene ranking, the burst detection, and thehashtag analysis. Thirdly, a combination of the Lucene ranking, the burstdetection, the hashtag analysis, and the Retweet analysis, and fourthly,again a combination of the Lucene ranking with the burst detection butin this case with more sophisticated query language and post-processing.We achieved the best MAP values overall in the fourth run.
Seifert Christin, Ulbrich Eva Pauline, Granitzer Michael
2011
In text classification the amount and quality of training datais crucial for the performance of the classifier. The generation of trainingdata is done by human labelers - a tedious and time-consuming work. Wepropose to use condensed representations of text documents instead ofthe full-text document to reduce the labeling time for single documents.These condensed representations are key sentences and key phrases andcan be generated in a fully unsupervised way. The key phrases are presentedin a layout similar to a tag cloud. In a user study with 37 participantswe evaluated whether document labeling with these condensedrepresentations can be done faster and equally accurate by the humanlabelers. Our evaluation shows that the users labeled word clouds twiceas fast but as accurately as full-text documents. While further investigationsfor different classification tasks are necessary, this insight couldpotentially reduce costs for the labeling process of text documents.
Granitzer Michael, Lindstaedt Stefanie
2011
Kern Roman, Zechner Mario, Granitzer Michael
2011
Author disambiguation is a prerequisite for utilizingbibliographic metadata in citation analysis. Automaticdisambiguation algorithms mostly rely on cluster-based disambiguationstrategies for identifying unique authors given theirnames and publications. However, most approaches rely onknowing the correct number of unique authors a-priori, whichis rarely the case in real world settings. In this publicationwe analyse cluster-based disambiguation strategies and developa model selection method to estimate the number of distinctauthors based on co-authorship networks. We show that, givenclean textual features, the developed model selection methodprovides accurate guesses of the number of unique authors.
Granitzer Michael, Kienreich Wolfgang, Sabol Vedran, Lex Elisabeth
2010
Technological advances and paradigmatic changes in the utilization of the World Wide Web havetransformed the information seeking strategies of media consumers and invalidated traditionalbusiness models of media providers. We discuss relevant aspects of this development and presenta knowledge relationship discovery pipeline to address the requirements of media providers andmedia consumers. We also propose visually enhanced access methods to bridge the gap betweencomplex media services and the information needs of the general public. We conclude that acombination of advanced processing methods and visualizations will enable media providers totake the step from content-centered to service-centered business models and, at the same time,will help media consumers to better satisfy their personal information needs.
Kern Roman, Granitzer Michael, Muhr M.
2010
Word sense induction and discrimination(WSID) identifies the senses of an ambiguousword and assigns instances of thisword to one of these senses. We have builda WSID system that exploits syntactic andsemantic features based on the results ofa natural language parser component. Toachieve high robustness and good generalizationcapabilities, we designed our systemto work on a restricted, but grammaticallyrich set of features. Based on theresults of the evaluations our system providesa promising performance and robustness.
Granitzer Michael, Sabol Vedran, Onn K., Lukose D.
2010
Kern Roman, Granitzer Michael, Muhr M.
2010
Cluster label quality is crucial for browsing topic hierarchiesobtained via document clustering. Intuitively, the hierarchicalstructure should influence the labeling accuracy. However,most labeling algorithms ignore such structural propertiesand therefore, the impact of hierarchical structureson the labeling accuracy is yet unclear. In our work weintegrate hierarchical information, i.e. sibling and parentchildrelations, in the cluster labeling process. We adaptstandard labeling approaches, namely Maximum Term Frequency,Jensen-Shannon Divergence, χ2 Test, and InformationGain, to take use of those relationships and evaluatetheir impact on 4 different datasets, namely the Open DirectoryProject, Wikipedia, TREC Ohsumed and the CLEFIP European Patent dataset. We show, that hierarchicalrelationships can be exploited to increase labeling accuracyespecially on high-level nodes.
Lex Elisabeth, Granitzer Michael, Juffinger A.
2010
In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people’s feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.
Lex Elisabeth, Granitzer Michael, Juffinger A.
2010
In this paper, we outline our experiments carried out at the TREC 2009 Blog Distillation Task. Our system is based on a plain text index extracted from the XML feeds of the TREC Blogs08 dataset. This index was used to retrieve candidate blogs for the given topics. The resulting blogs were classified using a Support Vector Machine that was trained on a manually labelled subset of the TREC Blogs08 dataset. Our experiments included three runs on different features: firstly on nouns, secondly on stylometric properties, and thirdly on punctuation statistics. The facet identification based on our approach was successful, although a significant number of candidate blogs were not retrieved at all.
Granitzer Michael, Kienreich Wolfgang
2010
Granitzer Michael
2010
Term weighting strongly influences the performance of text miningand information retrieval approaches. Usually term weights are determined throughstatistical estimates based on static weighting schemes. Such static approacheslack the capability to generalize to different domains and different data sets. Inthis paper, we introduce an on-line learning method for adapting term weightsin a supervised manner. Via stochastic optimization we determine a linear transformationof the term space to approximate expected similarity values amongdocuments. We evaluate our approach on 18 standard text data sets and showthat the performance improvement of a k-NN classifier ranges between 1% and12% by using adaptive term weighting as preprocessing step. Further, we provideempirical evidence that our approach is efficient to cope with larger problems
Granitzer Michael, Rath Andreas S., Kröll Mark, Ipsmiller D., Devaurs Didier, Weber Nicolas, Lindstaedt Stefanie , Seifert C.
2009
Increasing the productivity of a knowledgeworker via intelligent applications requires the identification ofa user’s current work task, i.e. the current work context a userresides in. In this work we present and evaluate machine learningbased work task detection methods. By viewing a work taskas sequence of digital interaction patterns of mouse clicks andkey strokes, we present (i) a methodology for recording thoseuser interactions and (ii) an in-depth analysis of supervised classificationmodels for classifying work tasks in two different scenarios:a task centric scenario and a user centric scenario. Weanalyze different supervised classification models, feature typesand feature selection methods on a laboratory as well as a realworld data set. Results show satisfiable accuracy and high useracceptance by using relatively simple types of features.
Lex Elisabeth, Granitzer Michael, Juffinger A., Seifert C.
2009
Text classification is one of the core applications in data mining due to the huge amount of not categorized digital data available. Training a text classifier generates a model that reflects the characteristics of the domain. However, if no training data is available, labeled data from a related but different domain might be exploited to perform crossdomain classification. In our work, we aim to accurately classify unlabeled blogs into commonly agreed newspaper categories using labeled data from the news domain. The labeled news and the unlabeled blog corpus are highly dynamic and hourly growing with a topic drift, so a trade-off between accuracy and performance is required. Our approach is to apply a fast novel centroid-based algorithm, the Class-Feature-Centroid Classifier (CFC), to perform efficient cross-domain classification. Experiments showed that this algorithm achieves a comparable accuracy than k-NN and is slightly better than Support Vector Machines (SVM), yet at linear time cost for training and classification. The benefit of this approach is that the linear time complexity enables us to efficiently generate an accurate classifier, reflecting the topic drift, several times per day on a huge dataset.
Granitzer Michael, Lex Elisabeth, Juffinger A.
2009
People use weblogs to express thoughts, present ideas and share knowledge. However, weblogs can also be misused to influence and manipulate the readers. Therefore the credibility of a blog has to be validated before the available information is used for analysis. The credibility of a blogentry is derived from the content, the credibility of the author or blog itself, respectively, and the external references or trackbacks. In this work we introduce an additional dimension to assess the credibility, namely the quantity structure. For our blog analysis system we derive the credibility therefore from two dimensions. Firstly, the quantity structure of a set of blogs and a reference corpus is compared and secondly, we analyse each separate blog content and examine the similarity with a verified news corpus. From the content similarity values we derive a ranking function. Our evaluation showed that one can sort out incredible blogs by quantity structure without deeper analysis. Besides, the content based ranking function sorts the blogs by credibility with high accuracy. Our blog analysis system is therefore capable of providing credibility levels per blog.
Neidhart T., Granitzer Michael, Kern Roman, Weichselbraun A., Wohlgenannt G., Scharl A., Juffinger A.
2009
Lex Elisabeth, Kienreich Wolfgang, Granitzer Michael, Seifert C.
2008
Granitzer Michael, Granitzer Gisela, Lindstaedt Stefanie , Rath Andreas S., Groiss W.
2008
It is a well known fact that a wealth of knowledge lies in thehead of employees making them one of the most or even the most valuableasset of organisations. But often this knowledge is not documented andorganised in knowledge systems as required by the organisation, butinformally shared. Of course this is against the organisation’s aim forkeeping knowledge reusable as well as easily and permanently availableindependent of individual knowledge workers.In this contribution we suggest a solution which captures the collectiveknowledge to the benefit of the organisation and the knowledge worker.By automatically identifying activity patterns and aggregating them totasks as well as by assigning resources to these tasks, our proposed solutionfulfils the organisation’s need for documentation and structuring ofknowledge work. On the other hand it fulfils the the knowledge worker’sneed for relevant, currently needed knowledge, by automatically miningthe entire corporate knowledge base and providing relevant, contextdependent information based on his/her current task.
Granitzer Michael, Lux M., Spaniol M.
2008
Granitzer Michael
2008
Rath Andreas S., Weber Nicolas, Kröll Mark, Granitzer Michael, Dietzel O., Lindstaedt Stefanie
2008
Improving the productivity of knowledge workers is anopen research challenge. Our approach is based onproviding a large variety of knowledge services which takethe current work task and information need (work context)of the knowledge worker into account. In the following wepresent the DYONIPOS application which strives toautomatically identify a user’s work task and thencontextualizes different types of knowledge servicesaccordingly. These knowledge services then provideinformation (documents, people, locations) both from theuser’s personal as well as from the organizationalenvironment. The utility and functionality is illustratedalong a real world application scenario at the Ministry ofFinance in Austria.
Granitzer Michael, Kröll Mark, Seifer Christin, Rath Andreas S., Weber Nicolas, Dietzel O., Lindstaedt Stefanie
2008
’Context is key’ conveys the importance of capturing thedigital environment of a knowledge worker. Knowing theuser’s context offers various possibilities for support, likefor example enhancing information delivery or providingwork guidance. Hence, user interactions have to be aggregatedand mapped to predefined task categories. Withoutmachine learning tools, such an assignment has to be donemanually. The identification of suitable machine learningalgorithms is necessary in order to ensure accurate andtimely classification of the user’s context without inducingadditional workload.This paper provides a methodology for recording user interactionsand an analysis of supervised classification models,feature types and feature selection for automatically detectingthe current task and context of a user. Our analysisis based on a real world data set and shows the applicabilityof machine learning techniques.
Scheir Peter, Granitzer Michael, Lindstaedt Stefanie
2007
Evaluation of information retrieval systems is a critical aspect of information retrieval research. New retrieval paradigms, as retrieval in the Semantic Web, present an additional challenge for system evaluation as no off-the-shelf test corpora for evaluation exist. This paper describes the approach taken to evaluate an information retrieval system built for the Semantic Desktop and demonstrates how standard measures from information retrieval research are employed for evaluation.
Kröll Mark, Rath Andreas S., Weber Nicolas, Lindstaedt Stefanie , Granitzer Michael
2007
Knowledge-intensive work plays an increasingly important role in organisations of all types. Knowledge workers contribute their effort to achieve a common purpose; they are part of (business) processes. Workflow Management Systems support them during their daily work, featuring guidance and providing intelligent resource delivery. However, the emergence of richly structured, heterogeneous datasets requires a reassessment of existing mining techniques which do not take possible relations between individual instances into account. Neglecting these relations might lead to inappropriate conclusions about the data. In order to uphold the support quality of knowledge workers, the application of mining methods, that consider structure information rather than content information, is necessary. In the scope of the research project DYONIPOS, user interaction patterns, e.g., relations between users, resources and tasks, are mapped in the form of graphs. We utilize graph kernels to exploit structural information and apply Support Vector Machines to classify task instances to task models
Rath Andreas S., Kröll Mark, Lindstaedt Stefanie , Granitzer Michael
2007
Knowledge intensive organizations demand a rethinking of business process awareness. Their employees are knowledge workers, who are performing their tasks in a weakly structured way. Stiff organizational processes have to be relaxed, adopted and flexibilized to be able to provide the essential freedom requested by knowledge workers. For effectively and efficiently supporting this type of creative worker the hidden patterns, i.e. how they reach their goals, have to be discovered. This paper focuses on perceiving the knowledge workers work habits in an automatic way for bringing their work patterns to the surface. Capturing low level operating system events, observing user interactions on a fine granular level and doing in deep application inspection, give the opportunity to interrelate the received data. In the scope of the research project DYONIPOS these interrelation abilities are utilized to semantically relate and enrich this captured data to picture the actual task of a knowledge worker. Once the goal of a knowledge worker is clear, intelligent information delivery can be applied
Strohmaier M., Lux M., Granitzer Michael, Scheir Peter, Liaskos S., Yu E.
2007
Scheir Peter, Granitzer Michael, Lindstaedt Stefanie , Hofmair P.
2006
In this contribution we present a tool for annotating documents, which are used for workintegratedlearning, with concepts from an ontology. To allow for annotating directly whilecreating or editing an ontology, the tool was realized as a plug-in for the ontology editor Protégé.Annotating documents with semantic metadata is a laborious task, most of the time knowledgerepresentations are created independently from the resources that should be annotated andadditionally in most work environments a high number of documents exist. To increase theefficiency of the person annotating, in our tool the process of assigning concepts to text-documentsis supported by automatic text-classification.
Rath Andreas S., Kröll Mark, Andrews K., Lindstaedt Stefanie , Granitzer Michael
2006
In a knowledge-intensive business environment, knowledgeworkers perform their tasks in highly creative ways. This essential freedomrequired by knowledge workers often conflicts with their organization’sneed for standardization, control, and transparency. Within thiscontext, the research project DYONIPOS aims to mitigate this contradictionby supporting the process engineer with insights into the processexecuter’s working behavior. These insights constitute the basis for balancedprocess modeling. DYONIPOS provides a process engineer supportenvironment with advanced process modeling services, such as processvisualization, standard process validation, and ad-hoc process analysisand optimization services.
Granitzer Michael, Lindstaedt Stefanie , Tochtermann K., Kröll Mark, Rath Andreas S.
2006
Knowledge-intensive work plays an increasinglyimportant role in organisations of all types. Thiswork is characterized by a defined input and adefined output but not the way how to transformthe input to an output. Within this context, theresearch project DYONIPOS aims at encouragingthe two crucial roles in a knowledge-intensiveorganization - the process executer and the processengineer. Ad-hoc support will be providedfor the knowledge worker by synergizing the developmentof context sensitive, intelligent, andagile semantic technologies with contextual retrieval.DYONIPOS provides process executerswith guidance through business processes andjust-in-time resource support based on the currentuser context, that are the focus of this paper.
Andrews K., Kienreich Wolfgang, Sabol Vedran, Granitzer Michael
2004
Granitzer Michael, Kienreich Wolfgang, Sabol Vedran, Andrews K.
2004
Lux M., Granitzer Michael, Kienreich Wolfgang, Sabol Vedran, Klieber Hans-Werner, Sarka W.
2004
Lux M., Klieber Hans-Werner, Granitzer Michael
2004
Granitzer Michael, Kienreich Wolfgang, Sabol Vedran, Dösinger G.
2003
Lux M., Granitzer Michael, Sabol Vedran, Kienreich Wolfgang, Becker J.
2003
Kienreich Wolfgang, Sabol Vedran, Granitzer Michael, Becker J.
2003
Andrews K., Kienreich Wolfgang, Sabol Vedran, Granitzer Michael
2003
Kappe F., Droschl G., Kienreich Wolfgang, Sabol Vedran, Andrews K., Granitzer Michael, Auer P.
2003
Kienreich Wolfgang, Sabol Vedran, Granitzer Michael, Kappe F., Andrews K.
2003
Sabol Vedran, Kienreich Wolfgang, Granitzer Michael, Becker J.
2003
Becker J., Granitzer Michael, Kienreich Wolfgang, Sabol Vedran
2002
Andrews K., Kienreich Wolfgang, Sabol Vedran, Becker J., Kappe F., Droschl G., Granitzer Michael, Auer P.
2002
Sabol Vedran, Kienreich Wolfgang, Granitzer Michael, Becker J.
2002
Kappe F., Droschl G., Kienreich Wolfgang, Sabol Vedran, Becker J., Andrews K., Granitzer Michael, Auer P.
2002
Sabol Vedran, Kienreich Wolfgang, Granitzer Michael, Becker J., Andrews K.
2002