Lovric Mario, Antunović Mario, Šunić Iva, Vuković Matej, Kecorius Simon, Kröll Mark, Bešlić Ivan, Godec Ranka, Pehnec Gordana, Geiger Bernhard, Grange Stuart K, Šimić Iva
2022
In this paper, the authors investigated changes in mass concentrations of particulate matter (PM) during the Coronavirus Disease of 2019 (COVID-19) lockdown. Daily samples of PM1, PM2.5 and PM10 fractions were measured at an urban background sampling site in Zagreb, Croatia from 2009 to late 2020. For the purpose of meteorological normalization, the mass concentrations were fed alongside meteorological and temporal data to Random Forest (RF) and LightGBM (LGB) models tuned by Bayesian optimization. The models’ predictions were subsequently de-weathered by meteorological normalization using repeated random resampling of all predictive variables except the trend variable. Three pollution periods in 2020 were examined in detail: January and February, as pre-lockdown, the month of April as the lockdown period, as well as June and July as the “new normal”. An evaluation using normalized mass concentrations of particulate matter and Analysis of variance (ANOVA) was conducted. The results showed that no significant differences were observed for PM1, PM2.5 and PM10 in April 2020—compared to the same period in 2018 and 2019. No significant changes were observed for the “new normal” as well. The results thus indicate that a reduction in mobility during COVID-19 lockdown in Zagreb, Croatia, did not significantly affect particulate matter concentration in the long-term
Reichel Robert, Gursch Heimo, Kröll Mark
2022
Der Trend, im Gesundheitswesen von Aufzeichnungen in Papierform auf digitale Formen zu wechseln, legt die Basis für eine elektronische Verarbeitung von Gesundheitsdaten. Dieser Artikel beschreibt die technischen Grundlagen für die semantische Aufbereitung und Analyse von textuellen Inhalten in der medizinischen Domäne. Die speziellen Eigenschaften medizinischer Texte gestalten die Extraktion sowie Aggregation relevanter Information herausfordernder als in anderen Anwendungsgebieten. Zusätzlich gibt es Bedarf an spezialisierten Methoden gerade im Bereich der Anonymisierung bzw. Pseudonymisierung personenbezogener Daten. Der Einsatz von Methoden der Computerlinguistik in Kombination mit der fortschreitenden Digitalisierung birgt dennoch enormes Potential, das Personal im Gesundheitswesen zu unterstützen.
Gogolenko Sergiy, Groen Derek, Suleimenova Dian, Mahmood Imra, Lawenda Marcin, Nieto De Santos Javie, Hanley Joh, Vukovic Milana, Kröll Mark, Geiger Bernhard, Elsaesser Rober, Hoppe Dennis
2020
Accurate digital twinning of the global challenges (GC) leadsto computationally expensive coupled simulations. These simulationsbring together not only different models, but also various sources of mas-sive static and streaming data sets. In this paper, we explore ways tobridge the gap between traditional high performance computing (HPC)and data-centric computation in order to provide efficient technologicalsolutions for accurate policy-making in the domain of GC. GC simula-tions in HPC environments give rise to a number of technical challengesrelated to coupling. Being intended to reflect current and upcoming situ-ation for policy-making, GC simulations extensively use recent streamingdata coming from external data sources, which requires changing tradi-tional HPC systems operation. Another common challenge stems fromthe necessity to couple simulations and exchange data across data centersin GC scenarios. By introducing a generalized GC simulation workflow,this paper shows commonality of the technical challenges for various GCand reflects on the approaches to tackle these technical challenges in theHiDALGO project
Lovric Mario, Šimić Iva, Godec Ranka, Kröll Mark, Beslic Ivan
2020
Narrow city streets surrounded by tall buildings are favorable to inducing a general effect of a “canyon” in which pollutants strongly accumulate in a relatively small area because of weak or inexistent ventilation. In this study, levels of nitrogen-oxide (NO2), elemental carbon (EC) and organic carbon (OC) mass concentrations in PM10 particles were determined to compare between seasons and different years. Daily samples were collected at one such street canyon location in the center of Zagreb in 2011, 2012 and 2013. By applying machine learning methods we showed seasonal and yearly variations of mass concentrations for carbon species in PM10 and NO2, as well as their covariations and relationships. Furthermore, we compared the predictive capabilities of five regressors (Lasso, Random Forest, AdaBoost, Support Vector Machine and Partials Least squares) with Lasso regression being the overall best performing algorithm. By showing the feature importance for each model, we revealed true predictors per target. These measurements and application of machine learning of pollutants were done for the first time at a street canyon site in the city of Zagreb, Croatia.
Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman
2018
The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e., authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors.Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper.Both studies confirmed that this task is quite challenging.To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (i) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (ii) assist forensic experts or linguists to create profiles of writers, (iii) support intelligence applications to analyze aggressive and threatening messages and (iv) help editor conformity by adhering to, for instance, journal specific writing style.
Bassa Akim, Kröll Mark, Kern Roman
2018
Open Information Extraction (OIE) is the task of extracting relations fromtext without the need of domain speci c training data. Currently, most of the researchon OIE is devoted to the English language, but little or no research has been conductedon other languages including German. We tackled this problem and present GerIE, anOIE parser for the German language. Therefore we started by surveying the availableliterature on OIE with a focus on concepts, which may also apply to the Germanlanguage. Our system is built upon the output of a dependency parser, on which anumber of hand crafted rules are executed. For the evaluation we created two dedicateddatasets, one derived from news articles and one based on texts from an encyclopedia.Our system achieves F-measures of up to 0.89 for sentences that have been correctlypreprocessed.
Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman
2017
In this pilot study, we tried to capture humans' behavior when identifying authorship of text snippets. At first, we selected textual snippets from the introduction of scientific articles written by single authors. Later, we presented to the evaluators a source and four target snippets, and then, ask them to rank the target snippets from the most to the least similar from the writing style.The dataset is composed by 66 experiments manually checked for not having any clear hint during the ranking for the evaluators. For each experiment, we have evaluations from three different evaluators.We present each experiment in a single line (in the CSV file), where, at first we present the metadata of the Source-Article (Journal, Title, Authorship, Snippet), and the metadata for the 4 target snippets (Journal, Title, Authorship, Snippet, Written From the same Author, Published in the same Journal) and the ranking given by each evaluator. This task was performed in the open source platform, Crowd Flower. The headers of the CSV are self-explained. In the TXT file, you can find a human-readable version of the experiment. For more information about the extraction of the data, please consider reading our paper: "Extending Scientific Literature Search by Including the Author’s Writing Style" @BIR: http://www.gesis.org/en/services/events/events-archive/conferences/ecir-workshops/ecir-workshop-2017
Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman
2017
Our work is motivated by the idea to extend the retrieval of related scientific literature to cases, where the relatedness also incorporates the writing style of individual scientific authors. Therefore we conducted a pilot study to answer the question whether humans can identity authorship once the topological clues have been removed. As first result, we found out that this task is challenging, even for humans. We also found some agreement between the annotators. To gain a better understanding how humans tackle such a problem, we conducted an exploratory data analysis. Here, we compared the decisions against a number of topological and stylometric features. The outcome of our work should help to improve automatic authorship identificationalgorithms and to shape potential follow-up studies.
Rexha Andi, Kern Roman, Dragoni Mauro , Kröll Mark
2016
With different social media and commercial platforms, users express their opinion about products in a textual form. Automatically extracting the polarity (i.e. whether the opinion is positive or negative) of a user can be useful for both actors: the online platform incorporating the feedback to improve their product as well as the client who might get recommendations according to his or her preferences. Different approaches for tackling the problem, have been suggested mainly using syntactic features. The “Challenge on Semantic Sentiment Analysis” aims to go beyond the word-level analysis by using semantic information. In this paper we propose a novel approach by employing the semantic information of grammatical unit called preposition. We try to drive the target of the review from the summary information, which serves as an input to identify the proposition in it. Our implementation relies on the hypothesis that the proposition expressing the target of the summary, usually containing the main polarity information.
Rexha Andi, Klampfl Stefan, Kröll Mark, Kern Roman
2016
To bring bibliometrics and information retrieval closer together, we propose to add the concept of author attribution into the pre-processing of scientific publications. Presently, common bibliographic metrics often attribute the entire article to all the authors affecting author-specific retrieval processes. We envision a more finegrained analysis of scientific authorship by attributing particular segments to authors. To realize this vision, we propose a new feature representation of scientific publications that captures the distribution of tylometric features. In a classification setting, we then seek to predict the number of authors of a scientific article. We evaluate our approach on a data set of ~ 6100 PubMed articles and achieve best results by applying random forests, i.e., 0.76 precision and 0.76 recall averaged over all classes.
Rexha Andi, Kröll Mark, Kern Roman
2016
Monitoring (social) media represents one means for companies to gain access to knowledge about, for instance, competitors, products as well as markets. As a consequence, social media monitoring tools have been gaining attention to handle amounts of data nowadays generated in social media. These tools also include summarisation services. However, most summarisation algorithms tend to focus on (i) first and last sentences respectively or (ii) sentences containing keywords.In this work we approach the task of summarisation by extracting 4W (who, when, where, what) information from (social)media texts. Presenting 4W information allows for a more compact content representation than traditional summaries. Inaddition, we depart from mere named entity recognition (NER) techniques to answer these four question types by includingnon-rigid designators, i.e. expressions which do not refer to the same thing in all possible worlds such as “at the main square”or “leaders of political parties”. To do that, we employ dependency parsing to identify grammatical characteristics for each question type. Every sentence is then represented as a 4W block. We perform two different preliminary studies: selecting sentences that better summarise texts by achieving an F1-measure of 0.343, as well as a 4W block extraction for which we achieve F1-measures of 0.932; 0.900; 0.803; 0.861 for “who”, “when”, “where” and “what” category respectively. In a next step the 4W blocks are ranked by relevance. The top three ranked blocks, for example, then constitute a summary of the entire textual passage. The relevance metric can be customised to the user’s needs, for instance, ranked by up-to-dateness where the sentences’ tense is taken into account. In a user study we evaluate different ranking strategies including (i) up-todateness,(ii) text sentence rank, (iii) selecting the firsts and lasts sentences or (iv) coverage of named entities, i.e. based on the number of named entities in the sentence. Our 4W summarisation method presents a valuable addition to a company’s(social) media monitoring toolkit, thus supporting decision making processes.
Pimas Oliver, Rexha Andi, Kröll Mark, Kern Roman
2016
The PAN 2016 author profiling task is a supervised classification problemon cross-genre documents (tweets, blog and social media posts). Our systemmakes use of concreteness, sentiment and syntactic information present in thedocuments. We train a random forest model to identify gender and age of a document’sauthor. We report the evaluation results received by the shared task.
Gursch Heimo, Ziak Hermann, Kröll Mark, Kern Roman
2016
Modern knowledge workers need to interact with a large number of different knowledge sources with restricted or public access. Knowledge workers are thus burdened with the need to familiarise and query each source separately. The EEXCESS (Enhancing Europe’s eXchange in Cultural Educational and Scientific reSources) project aims at developing a recommender system providing relevant and novel content to its users. Based on the user’s work context, the EEXCESS system can either automatically recommend useful content, or support users by providing a single user interface for a variety of knowledge sources. In the design process of the EEXCESS system, recommendation quality, scalability and security where the three most important criteria. This paper investigates the scalability aspect achieved by federated design of the EEXCESS recommender system. This means that, content in different sources is not replicated but its management is done in each source individually. Recommendations are generated based on the context describing the knowledge worker’s information need. Each source offers result candidates which are merged and re-ranked into a single result list. This merging is done in a vector representation space to achieve high recommendation quality. To ensure security, user credentials can be set individually by each user for each source. Hence, access to the sources can be granted and revoked for each user and source individually. The scalable architecture of the EEXCESS system handles up to 100 requests querying up to 10 sources in parallel without notable performance deterioration. The re-ranking and merging of results have a smaller influence on the system's responsiveness than the average source response rates. The EEXCESS recommender system offers a common entry point for knowledge workers to a variety of different sources with only marginally lower response times as the individual sources on their own. Hence, familiarisation with individual sources and their query language is not necessary.
Rexha Andi, Dragoni Mauro, Kern Roman, Kröll Mark
2016
Ontology matching in a multilingual environment consists of finding alignments between ontologies modeled by using more than one language. Such a research topic combines traditional ontology matching algorithms with the use of multilingual resources, services, and capabilities for easing multilingual matching. In this paper, we present a multilingual ontology matching approach based on Information Retrieval (IR) techniques: ontologies are indexed through an inverted index algorithm and candidate matches are found by querying such indexes. We also exploit the hierarchical structure of the ontologies by adopting the PageRank algorithm for our system. The approaches have been evaluated using a set of domain-specific ontologies belonging to the agricultural and medical domain. We compare our results with existing systems following an evaluation strategy closely resembling a recommendation scenario. The version of our system using PageRank showed an increase in performance in our evaluations.
Yusuke Fukazawa, Kröll Mark, Strohmaier M., Ota Jun
2016
Task-models concretize general requests to support users in real-world scenarios. In this paper, we present an IR based algorithm (IRTML) to automate the construction of hierarchically structured task-models. In contrast to other approaches, our algorithm is capable of assigning general tasks closer to the top and specific tasks closer to the bottom. Connections between tasks are established by extending Turney’s PMI-IR measure. To evaluate our algorithm, we manually created a ground truth in the health-care domain consisting of 14 domains. We compared the IRTML algorithm to three state-of-the-art algorithms to generate hierarchical structures, i.e. BiSection K-means, Formal Concept Analysis and Bottom-Up Clustering. Our results show that IRTML achieves a 25.9% taxonomic overlap with the ground truth, a 32.0% improvement over the compared algorithms.
Dragoni Mauro, Rexha Andi, Kröll Mark, Kern Roman
2016
Twitter is one of the most popular micro-blogging serviceson the web. The service allows sharing, interaction and collaboration viashort, informal and often unstructured messages called tweets. Polarityclassification of tweets refers to the task of assigning a positive or a nega-tive sentiment to an entire tweet. Quite similar is predicting the polarityof a specific target phrase, for instance@Microsoftor#Linux,whichiscontained in the tweet.In this paper we present a Word2Vec approach to automatically pre-dict the polarity of a target phrase in a tweet. In our classification setting,we thus do not have any polarity information but use only semantic infor-mation provided by a Word2Vec model trained on Twitter messages. Toevaluate our feature representation approach, we apply well-establishedclassification algorithms such as the Support Vector Machine and NaiveBayes. For the evaluation we used theSemeval 2016 Task #4dataset.Our approach achieves F1-measures of up to∼90 % for the positive classand∼54 % for the negative class without using polarity informationabout single words.
Pimas Oliver, Klampfl Stefan, Kohl Thomas, Kern Roman, Kröll Mark
2016
Patents and patent applications are important parts of acompany’s intellectual property. Thus, companies put a lot of effort indesigning and maintaining an internal structure for organizing their ownpatent portfolios, but also in keeping track of competitor’s patent port-folios. Yet, official classification schemas offered by patent offices (i) areoften too coarse and (ii) are not mappable, for instance, to a company’sfunctions, applications, or divisions. In this work, we present a first steptowards generating tailored classification. To automate the generationprocess, we apply key term extraction and topic modelling algorithmsto 2.131 publications of German patent applications. To infer categories,we apply topic modelling to the patent collection. We evaluate the map-ping of the topics found via the Latent Dirichlet Allocation method tothe classes present in the patent collection as assigned by the domainexpert.
Steinbauer Florian, Kröll Mark
2016
Social media monitoring has become an important means for business analytics and trend detection, for instance, analyzing the senti-ment towards a certain product or decision. While a lot of work has beendedicated to analyze sentiment for English texts, much less effort hasbeen put into providing accurate sentiment classification for the Germanlanguage. In this paper, we analyze three established classifiers for theGerman language with respect to Facebook posts. We then present ourown hierarchical approach to classify sentiment and evaluate it using adata set of∼640 Facebook posts from corporate as well as governmentalFacebook pages. We compare our approach to three sentiment classifiersfor German, i.e. AlchemyAPI, Semantria and SentiStrength. With anaccuracy of 70 %, our approach performs better than the other classi-fiers. In an application scenario, we demonstrate our classifier’s abilityto monitor changes in sentiment with respect to the refugee crisis.
Wozelka Ralph, Kröll Mark, Sabol Vedran
2015
The analysis of temporal relationships in large amounts of graph data has gained significance in recent years. In-formation providers such as journalists seek to bring order into their daily work when dealing with temporally dis-tributed events and the network of entities, such as persons, organisations or locations, which are related to these events. In this paper we introduce a time-oriented graph visualisation approach which maps temporal information to visual properties such as size, transparency and position and, combined with advanced graph navigation features, facilitates the identification and exploration of temporal relationships. To evaluate our visualisation, we compiled a dataset of ~120.000 news articles from international press agencies including Reuters, CNN, Spiegel and Aljazeera. Results from an early pilot study show the potentials of our visualisation approach and its usefulness for analysing temporal relationships in large data sets.
Pimas Oliver, Kröll Mark, Kern Roman
2015
Our system for the PAN 2015 authorship verification challenge is basedupon a two step pre-processing pipeline. In the first step we extract different fea-tures that observe stylometric properties, grammatical characteristics and purestatistical features. In the second step of our pre-processing we merge all thosefeatures into a single meta feature space. We train an SVM classifier on the gener-ated meta features to verify the authorship of an unseen text document. We reportthe results from the final evaluation as well as on the training datasets
Rexha Andi, Klampfl Stefan, Kröll Mark, Kern Roman
2015
The overwhelming majority of scientific publications are authored by multiple persons; yet, bibliographic metrics are only assigned to individual articles as single entities. In this paper, we aim at a more fine-grained analysis of scientific authorship. We therefore adapt a text segmentation algorithm to identify potential author changes within the main text of a scientific article, which we obtain by using existing PDF extraction techniques. To capture stylistic changes in the text, we employ a number of stylometric features. We evaluate our approach on a small subset of PubMed articles consisting of an approximately equal number of research articles written by a varying number of authors. Our results indicate that the more authors an article has the more potential author changes are identified. These results can be considered as an initial step towards a more detailed analysis of scientific authorship, thereby extending the repertoire of bibliometrics.
Kröll Mark, Strohmaier M.
2015
People willingly provide more and more information about themselves on social media platforms. This personal information about users’ emotions (sentiment) or goals (intent) is particularly valuable, for instance, for monitoring tools. So far, sentiment and intent analysis were conducted separately. Yet, both aspects can complement each other thereby informing processes such as explanation and reasoning. In this paper, we investigate the relation between intent and sentiment in weblogs. We therefore extract ~90,000 human goal instances from the ICWSM 2009 Spinn3r dataset and assign respective sentiments. Our results indicate that associating intent with sentiment represents a valuable addition to research areas such as text analytics and text understanding.
Lindstaedt Stefanie , Reiter, T., Cik, M., Haberl, M., Breitwieser, C., Scherer, R., Kröll Mark, Horn Christopher, Müller-Putz, G., Fellendorf, M.
2013
Today, proper traffic incident management (IM) has to deal increasingly with problems such as traffic congestion and environmental sustainability. Therefore, IM intends to clear the road for traffic as quickly as possible after an incident has happened. Electronic data verifiably has great potential for supporting traffic incident management. As a consequence, this paper presents an innovative incident detection method using anonymized mobile communications data. The aim is to outline suitable methods for depicting the traffic situation of a designated test area. In order to be successful, the method needs to be able to calculate the traffic situation in-time and report anomalies back to the motorway operator. The resulting procedures are compared to data from real incidents and are thus validated. Special attention is turned to the question whether incidents can be detected quicker with the aid of mobile phone data than with conventional methods. Also, a focus is laid on the quicker deregistration of the incident, so that the traffic management can react superiorly.
Trattner Christoph, Smadi Mohammad, Theiler Dieter, Dennerlein Sebastian, Kowald Dominik, Rella Matthias, Kraker Peter, Barreto da Rosa Isaías, Tomberg Vladimir, Kröll Mark, Treasure-Jones Tamsin, Kerr Micky, Lindstaedt Stefanie , Ley Tobias
2013
Fellendorf Martin, Brandstätter Michael, Reiter Thomas, Lindstaedt Stefanie , Breitwieser Christian, Haberl Michael, Hebenstreit Cornelia, Scherer Reinhold, Kraschl-Hirschman Karin, Kröll Mark, Ruthner Thomas, Walther Bernhard
2012
Das mobile Verkehrsmanagementsystem MOVEMENTS soll als einfaches und zuverlässiges System entwickelt werden, das durch mobile Anzeigemöglichkeiten mit dezentralen Ansteuerungsmöglichkeiten und zentraler Überwachungsfunktion flächendeckend einsetzbar ist. Bei den Anzeigetafeln ist auf Lesbarkeit und Verständlichkeit von Texten und Piktogrammen zu achten, um für die Verkehrsteilnehmer auch unter schlechten Sichtbedingungen wahrnehmbar zu sein. Die mobile Anzeige soll sowohl für planbare Ereignisse (Veranstaltungen, Baustellen, ...), als auch für ungeplante Ereignisse längerer Dauer (Unfälle mit verkehrsbeeinträchtigender Wirkung, Straßensperren durch Naturereignisse, wie Hangrutschungen, ...) eingesetzt werden. Generell sollen durch den Einsatz von MOVEMENTS die Lenkungs- und Informationsmöglichkeiten der ASFINAG in Netzteilen ohne Verkehrsbeeinflussungsanlagen verbessert werden
Kröll Mark, Strohmaier M.
2010
In this paper, we introduce the idea of Intent Analysis, which is to create a profile of the goals and intentions present in textual content. Intent Analysis, similar to Sentiment Analysis, represents a type of document classification that differs from traditional topic categorization by focusing on classification by intent. We investigate the extent to which the automatic analysis of human intentions in text is feasible and report our preliminary results, and discuss potential applications. Inaddition, we present results from a study that focused on evaluating intent profiles generated from transcripts of American presidential candidate speeches in 2008.
Kröll Mark, Prettenhofer P., Strohmaier M.
2009
Access to knowledge about user goals represents a critical component for realizing the vision of intelligent agents acting upon user intent on the web. Yet, the manual acquisition of knowledge about user goals is costly and often infeasible. In a departure from existing approaches, this paper proposes Goal Mining as a novel perspective for knowledge acquisition. The research presented in this chapter makes the following contributions: (a) it presents Goal Mining as an emerging field of research and a corresponding automatic method for the acquisition of user goals from web corpora, in the case of this paper search query logs (b) it provides insights into the nature and some characteristics of these goals and (c) it shows that the goals acquired from query logs exhibit traits of a long tail distribution, thereby providing access to a broad range of user goals. Our results suggest that search query logs represent a viable, yet largely untapped resource for acquiring knowledge about explicit user goals
Körner C., Kröll Mark, Strohmaier M.
2009
Understanding search intent is often assumed to represent a critical barrier to the level of service that search engine providers can achieve. Previous research has shown that search queries differ with regard to intentional explicitness. We build on this observation and introduce Intentional Query Suggestion as a novel idea that aims to make searcher’s intent more explicit during search. In this paper, we present an algorithm for Intentional Query Suggestion and corresponding data from comparative experiments with traditional query suggestion mechanisms. Our results suggest that Intentional Query Suggestion 1) diversifies search result sets (i.e. it reduces result set overlap) and 2) exhibits interesting differences in terms of click-through rates
Kröll Mark, Strohmaier M.
2009
Knowledge about human goals has been found to be an important kind of knowledge for a range of challenging problems, such as goal recognition from peoples’ actions or reasoning about human goals. Necessary steps towards conducting such complex tasks involve (i) ac-quiring a broad range of human goals and (ii) making them accessible by structuring and storing them in a knowledge base. In this work, we focus on extracting goal knowledge from weblogs, a largely untapped resource that can be expected to contain a broad variety of hu-man goals. We annotate a small sample of web-logs and devise a set of simple lexico-syntactic patterns that indicate the presence of human goals. We then evaluate the quality of our pat-terns by conducting a human subject study. Re-sulting precision values favor patterns that are not merely based on part-of-speech tags. In fu-ture steps, we intend to improve these prelimi-nary patterns based on our observations
Kröll Mark, Koerner C.
2009
Annotations represent an increasingly popular means for organizing, categorizing and finding resources on the “social” web. Yet, only a small portion of the total resources available on the web are annotated. In this paper, we describe a prototype - iTAG - for automatically annotating textual resources with human intent, a novel dimension of tagging. We investigate the extent to which the automatic analysis of human intentions in textual resources is feasible. To address this question, we present selected evidence from a study aiming to automatically annotate intent in a simplified setting, that is transcripts of speeches given by US presidential candidates in 2008
Kröll Mark
2009
Access to knowledge about common human goals has been found critical for realizing the vision of intelligent agents acting upon user intent on the web. Yet, the ac-quisition of knowledge about common human goals rep-resents a major challenge. In a departure from existing approaches, this paper investigates a novel resource for knowledge acquisition: The utilization of search query logs for this task. By relating goals contained in search query logs with goals contained in existing com-monsense knowledge bases such as ConceptNet, we aim to shed light on the usefulness of search query logs for capturing knowledge about common human goals. The main contribution of this paper consists of an empirical study comparing common human goals contained in two large search query logs (AOL and Microsoft Research) with goals contained in the commonsense knowledge base ConceptNet. The paper sketches ways how goals from search query logs could be used to address the goal acquisition and goal coverage problem related to com-monsense knowledge bases
Jeanquartier Fleur, Kröll Mark, Strohmaier M.
2009
Getting a quick impression of the author's intention of a text is a task often performed. An author's intention plays a major role in successfully understanding a text. For supporting readers in this task, we present an intentional approach to visual text analysis, making use of tag clouds. The objectiveof tag clouds is presenting meta-information in a visually appealing way. However there is also much uncertainty associated with tag clouds, such as giving the wrong impression. It is not clear whether the author's intent can be grasped clearly while looking at a corresponding tag cloud. Therefore it is interesting to ask to what extent, with tag clouds, it is possible to support the user in understanding intentions expressed. In order to answer this question, we construct an intentional perspective on textual content. Based on an existing algorithm for extracting intent annotations from textual content we present a prototypical implementation to produce intent tag clouds, and describe a formative testing, illustrating how intent visualizations may support readers in understanding a text successfully. With the initial prototype, we conducted user studies of our intentional tag cloud visualization and a comparison with a traditional one that visualizes frequent terms. The evaluation's results indicate, that intent tag clouds have a positive effect on supporting users in grasping an author's intent.
Granitzer Michael, Rath Andreas S., Kröll Mark, Ipsmiller D., Devaurs Didier, Weber Nicolas, Lindstaedt Stefanie , Seifert C.
2009
Increasing the productivity of a knowledgeworker via intelligent applications requires the identification ofa user’s current work task, i.e. the current work context a userresides in. In this work we present and evaluate machine learningbased work task detection methods. By viewing a work taskas sequence of digital interaction patterns of mouse clicks andkey strokes, we present (i) a methodology for recording thoseuser interactions and (ii) an in-depth analysis of supervised classificationmodels for classifying work tasks in two different scenarios:a task centric scenario and a user centric scenario. Weanalyze different supervised classification models, feature typesand feature selection methods on a laboratory as well as a realworld data set. Results show satisfiable accuracy and high useracceptance by using relatively simple types of features.
Strohmaier M., Prettenhofer P., Kröll Mark
2008
On the web, search engines represent a primary instrument through which users exercise their intent. Understanding the specific goals users express in search queries could improve our theoretical knowledge about strategies for search goal formulation and search behavior, and could equip search engine providers with better descriptions of users’ information needs. However, the degree to which goals are explicitly expressed in search queries can be suspected to exhibit considerable variety, which poses a series of challenges for researchers and search engine providers. This paper introduces a novel perspective on analyzing user goals in search query logs by proposing to study different degrees of intentional explicitness. To explore the implications of this perspective, we studied two different degrees of explicitness of user goals in the AOL search query log containing more than 20 million queries. Our results suggest that different degrees of intentional explicitness represent an orthogonal dimension to existing search query categories and that understanding these different degrees is essential for effective search. The overall contribution of this paper is the elaboration of a set of theoretical arguments and empirical evidence that makes a strong case for further studies of different degrees of intentional explicitness in search query logs.
Rath Andreas S., Weber Nicolas, Kröll Mark, Granitzer Michael, Dietzel O., Lindstaedt Stefanie
2008
Improving the productivity of knowledge workers is anopen research challenge. Our approach is based onproviding a large variety of knowledge services which takethe current work task and information need (work context)of the knowledge worker into account. In the following wepresent the DYONIPOS application which strives toautomatically identify a user’s work task and thencontextualizes different types of knowledge servicesaccordingly. These knowledge services then provideinformation (documents, people, locations) both from theuser’s personal as well as from the organizationalenvironment. The utility and functionality is illustratedalong a real world application scenario at the Ministry ofFinance in Austria.
Granitzer Michael, Kröll Mark, Seifer Christin, Rath Andreas S., Weber Nicolas, Dietzel O., Lindstaedt Stefanie
2008
’Context is key’ conveys the importance of capturing thedigital environment of a knowledge worker. Knowing theuser’s context offers various possibilities for support, likefor example enhancing information delivery or providingwork guidance. Hence, user interactions have to be aggregatedand mapped to predefined task categories. Withoutmachine learning tools, such an assignment has to be donemanually. The identification of suitable machine learningalgorithms is necessary in order to ensure accurate andtimely classification of the user’s context without inducingadditional workload.This paper provides a methodology for recording user interactionsand an analysis of supervised classification models,feature types and feature selection for automatically detectingthe current task and context of a user. Our analysisis based on a real world data set and shows the applicabilityof machine learning techniques.
Kröll Mark, Rath Andreas S., Weber Nicolas, Lindstaedt Stefanie , Granitzer Michael
2007
Knowledge-intensive work plays an increasingly important role in organisations of all types. Knowledge workers contribute their effort to achieve a common purpose; they are part of (business) processes. Workflow Management Systems support them during their daily work, featuring guidance and providing intelligent resource delivery. However, the emergence of richly structured, heterogeneous datasets requires a reassessment of existing mining techniques which do not take possible relations between individual instances into account. Neglecting these relations might lead to inappropriate conclusions about the data. In order to uphold the support quality of knowledge workers, the application of mining methods, that consider structure information rather than content information, is necessary. In the scope of the research project DYONIPOS, user interaction patterns, e.g., relations between users, resources and tasks, are mapped in the form of graphs. We utilize graph kernels to exploit structural information and apply Support Vector Machines to classify task instances to task models
Burgsteiner H., Kröll Mark, Leopold A., Steinbauer G.
2007
The prediction of time series is an important task in finance, economy, object tracking, state estimation and robotics. Prediction is in general either based on a well-known mathematical description of the system behind the time series or learned from previously collected time series. In this work we introduce a novel approach to learn predictions of real world time series like object trajectories in robotics. In a sequence of experiments we evaluate whether a liquid state machine in combination with a supervised learning algorithm can be used to predict ball trajectories with input data coming from a video camera mounted on a robot participating in the RoboCup. The pre-processed video data is fed into a recurrent spiking neural network. Connections to some output neurons are trained by linear regression to predict the position of a ball in various time steps ahead. The main advantages of this approach are that due to the nonlinear projection of the input data to a high-dimensional space simple learning algorithms can be used, that the liquid state machine provides temporal memory capabilities and that this kind of computation appears biologically more plausible than conventional methods for prediction. Our results support the idea that learning with a liquid state machine is a generic powerful tool for prediction.
Rath Andreas S., Kröll Mark, Lindstaedt Stefanie , Granitzer Michael
2007
Knowledge intensive organizations demand a rethinking of business process awareness. Their employees are knowledge workers, who are performing their tasks in a weakly structured way. Stiff organizational processes have to be relaxed, adopted and flexibilized to be able to provide the essential freedom requested by knowledge workers. For effectively and efficiently supporting this type of creative worker the hidden patterns, i.e. how they reach their goals, have to be discovered. This paper focuses on perceiving the knowledge workers work habits in an automatic way for bringing their work patterns to the surface. Capturing low level operating system events, observing user interactions on a fine granular level and doing in deep application inspection, give the opportunity to interrelate the received data. In the scope of the research project DYONIPOS these interrelation abilities are utilized to semantically relate and enrich this captured data to picture the actual task of a knowledge worker. Once the goal of a knowledge worker is clear, intelligent information delivery can be applied
Rath Andreas S., Kröll Mark, Andrews K., Lindstaedt Stefanie , Granitzer Michael
2006
In a knowledge-intensive business environment, knowledgeworkers perform their tasks in highly creative ways. This essential freedomrequired by knowledge workers often conflicts with their organization’sneed for standardization, control, and transparency. Within thiscontext, the research project DYONIPOS aims to mitigate this contradictionby supporting the process engineer with insights into the processexecuter’s working behavior. These insights constitute the basis for balancedprocess modeling. DYONIPOS provides a process engineer supportenvironment with advanced process modeling services, such as processvisualization, standard process validation, and ad-hoc process analysisand optimization services.
Granitzer Michael, Lindstaedt Stefanie , Tochtermann K., Kröll Mark, Rath Andreas S.
2006
Knowledge-intensive work plays an increasinglyimportant role in organisations of all types. Thiswork is characterized by a defined input and adefined output but not the way how to transformthe input to an output. Within this context, theresearch project DYONIPOS aims at encouragingthe two crucial roles in a knowledge-intensiveorganization - the process executer and the processengineer. Ad-hoc support will be providedfor the knowledge worker by synergizing the developmentof context sensitive, intelligent, andagile semantic technologies with contextual retrieval.DYONIPOS provides process executerswith guidance through business processes andjust-in-time resource support based on the currentuser context, that are the focus of this paper.