Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2018

Rexha Andi, Kröll Mark, Kern Roman

Multilingual Open Information Extraction using Parallel Corpora: The German Language Case

ACM Symposium on Applied Computing , Hisham M. Haddad, Roger L. Wainwright, ACM, 2018

Konferenz
In the past decade the research community has been continuously improving theextraction quality of Open Information Extraction systems. This was done mainlyfor the English language; other languages such as German or Spanish followedusing shallow or deep parsing information to derive language-specific patterns.More recent efforts focused on language agnostic approaches in an attempt tobecome less dependent on available tools and resources in that language. In linewith these efforts, we present a language agnostic approach which exploitsmanually aligned corpora as well as the solid performance of English OpenIEtools.
2018

Bassa Kevin, Kern Roman, Kröll Mark

On-the-fly Data Set Generation for Single Fact Validation

SAC 2018, 2018

Konferenz
On the web, massive amounts of information are available, includingwrong (or conflicting) information. This spreading of erroneous or fake contentsmakes it hard for users to distinguish between what is true and what is not. Factfinding algorithms represent a means to validate information. Yet, these algorithmsrequire an already existing, structured data set to validate a single fact; anad-hoc validation is thus not supported making them impractical for usage in realworld applications. This work presents an approach to generate these data setson-the-fly. For three facts, we generate respective data sets and apply six state-ofthe-art fact finding algorithms for evaluation purposes. In addition, our approachcontributes to comparing fact finding algorithms in a more objective way.
2017

Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman

Extending Scientific Literature Search by Including the Author’s Writing Style

Fifth Workshop on Bibliometric-enhanced Information Retrieval, Atanassova, I.; Bertin, M.; Mayr, P., Springer, Aberdeen, UK, 2017

Konferenz
Our work is motivated by the idea to extend the retrieval of related scientific literature to cases, where the relatedness also incorporates the writing style of individual scientific authors. Therefore we conducted a pilot study to answer the question whether humans can identity authorship once the topological clues have been removed. As first result, we found out that this task is challenging, even for humans. We also found some agreement between the annotators. To gain a better understanding how humans tackle such a problem, we conducted an exploratory data analysis. Here, we compared the decisions against a number of topological and stylometric features. The outcome of our work should help to improve automatic authorship identificationalgorithms and to shape potential follow-up studies.
2016

Steinbauer Florian, Kröll Mark

Sentiment Analysis for German Facebook Pages

21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Springer-Verlag, Salford, UK, 2016

Konferenz
Social media monitoring has become an important means forbusiness analytics and trend detection, for instance, analyzing the senti-ment towards a certain product or decision. While a lot of work has beendedicated to analyze sentiment for English texts, much less effort hasbeen put into providing accurate sentiment classification for the Germanlanguage. In this paper, we analyze three established classifiers for theGerman language with respect to Facebook posts. We then present ourown hierarchical approach to classify sentiment and evaluate it using adata set of∼640 Facebook posts from corporate as well as governmentalFacebook pages. We compare our approach to three sentiment classifiersfor German, i.e. AlchemyAPI, Semantria and SentiStrength. With anaccuracy of 70 %, our approach performs better than the other classi-fiers. In an application scenario, we demonstrate our classifier’s abilityto monitor changes in sentiment with respect to the refugee crisis.
2016

Rexha Andi, Klampfl Stefan, Kröll Mark, Kern Roman

Towards a more fine grained analysis of scientific authorship: Predicting the number of authors using stylometric features

BIR 2016 Workshop on Bibliometric-enhanced Information Retrieval, Atanassova, I.; Bertin, M.; Mayr, P., Springer, Padova, Italy, 2016

Konferenz
To bring bibliometrics and information retrieval closer together, we propose to add the concept of author attribution into the pre-processing of scientific publications. Presently, common bibliographic metrics often attribute the entire article to all the authors affecting author-specific retrieval processes. We envision a more finegrained analysis of scientific authorship by attributing particular segments to authors. To realize this vision, we propose a new feature representation of scientific publications that captures the distribution of tylometric features. In a classification setting, we then seek to predict the number of authors of a scientific article. We evaluate our approach on a data set of ~ 6100 PubMed articles and achieve best results by applying random forests, i.e., 0.76 precision and 0.76 recall averaged over all classes.
2016

Dragoni Mauro, Rexha Andi, Kröll Mark, Kern Roman

Polarity Classification for Target Phrases in Tweets: A Word2Vec approach

The Semantic Web, ESWC 2016 Satellite Events, ESWC 2016, Springer-Verlag, Crete, Greece, 2016

Konferenz
Twitter is one of the most popular micro-blogging serviceson the web. The service allows sharing, interaction and collaboration viashort, informal and often unstructured messages called tweets. Polarityclassification of tweets refers to the task of assigning a positive or a nega-tive sentiment to an entire tweet. Quite similar is predicting the polarityof a specific target phrase, for instance@Microsoftor#Linux,whichiscontained in the tweet.In this paper we present a Word2Vec approach to automatically pre-dict the polarity of a target phrase in a tweet. In our classification setting,we thus do not have any polarity information but use only semantic infor-mation provided by a Word2Vec model trained on Twitter messages. Toevaluate our feature representation approach, we apply well-establishedclassification algorithms such as the Support Vector Machine and NaiveBayes. For the evaluation we used theSemeval 2016 Task #4dataset.Our approach achieves F1-measures of up to∼90 % for the positive classand∼54 % for the negative class without using polarity informationabout single words.
2016

Rexha Andi, Kern Roman, Dragoni Mauro , Kröll Mark

Exploiting Propositions for Opinion Mining

ESWC-16 Challenge on Semantic Sentiment Analysis, Springer Link, Springer-Verlag, Crete, Greece, 2016

Konferenz
With different social media and commercial platforms, users express their opinion about products in a textual form. Automatically extracting the polarity (i.e. whether the opinion is positive or negative) of a user can be useful for both actors: the online platform incorporating the feedback to improve their product as well as the client who might get recommendations according to his or her preferences. Different approaches for tackling the problem, have been suggested mainly using syntactic features. The “Challenge on Semantic Sentiment Analysis” aims to go beyond the word-level analysis by using semantic information. In this paper we propose a novel approach by employing the semantic information of grammatical unit called preposition. We try to drive the target of the review from the summary information, which serves as an input to identify the proposition in it. Our implementation relies on the hypothesis that the proposition expressing the target of the summary, usually containing the main polarity information.
2016

Pimas Oliver, Klampfl Stefan, Kohl Thomas, Kern Roman, Kröll Mark

Generating Tailored Classification Schemas for German Patents

21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Springer-Verlag, Salford, UK, 2016

Konferenz
Patents and patent applications are important parts of acompany’s intellectual property. Thus, companies put a lot of effort indesigning and maintaining an internal structure for organizing their ownpatent portfolios, but also in keeping track of competitor’s patent port-folios. Yet, official classification schemas offered by patent offices (i) areoften too coarse and (ii) are not mappable, for instance, to a company’sfunctions, applications, or divisions. In this work, we present a first steptowards generating tailored classification. To automate the generationprocess, we apply key term extraction and topic modelling algorithmsto 2.131 publications of German patent applications. To infer categories,we apply topic modelling to the patent collection. We evaluate the map-ping of the topics found via the Latent Dirichlet Allocation method tothe classes present in the patent collection as assigned by the domainexpert.
2016

Rexha Andi, Dragoni Mauro, Kern Roman, Kröll Mark

An Information Retrieval Based Approach for Multilingual Ontology Matching

International Conference on Applications of Natural Language to Information Systems, Métais E., Meziane F., Saraee M., Sugumaran V., Vadera S. , Springer , Salford, UK, 2016

Konferenz
Ontology matching in a multilingual environment consists of finding alignments between ontologies modeled by using more than one language. Such a research topic combines traditional ontology matching algorithms with the use of multilingual resources, services, and capabilities for easing multilingual matching. In this paper, we present a multilingual ontology matching approach based on Information Retrieval (IR) techniques: ontologies are indexed through an inverted index algorithm and candidate matches are found by querying such indexes. We also exploit the hierarchical structure of the ontologies by adopting the PageRank algorithm for our system. The approaches have been evaluated using a set of domain-specific ontologies belonging to the agricultural and medical domain. We compare our results with existing systems following an evaluation strategy closely resembling a recommendation scenario. The version of our system using PageRank showed an increase in performance in our evaluations.
2016

Gursch Heimo, Ziak Hermann, Kröll Mark, Kern Roman

Context-Driven Federated Recommendations for Knowledge Workers

Proceedings of the 17th European Conference on Knowledge Management (ECKM), Dr. Sandra Moffett and Dr. Brendan Galbraith, Academic Conferences and Publishing International Limited, Belfast, Northern Ireland, UK, 2016

Konferenz
Modern knowledge workers need to interact with a large number of different knowledge sources with restricted or public access. Knowledge workers are thus burdened with the need to familiarise and query each source separately. The EEXCESS (Enhancing Europe’s eXchange in Cultural Educational and Scientific reSources) project aims at developing a recommender system providing relevant and novel content to its users. Based on the user’s work context, the EEXCESS system can either automatically recommend useful content, or support users by providing a single user interface for a variety of knowledge sources. In the design process of the EEXCESS system, recommendation quality, scalability and security where the three most important criteria. This paper investigates the scalability aspect achieved by federated design of the EEXCESS recommender system. This means that, content in different sources is not replicated but its management is done in each source individually. Recommendations are generated based on the context describing the knowledge worker’s information need. Each source offers result candidates which are merged and re-ranked into a single result list. This merging is done in a vector representation space to achieve high recommendation quality. To ensure security, user credentials can be set individually by each user for each source. Hence, access to the sources can be granted and revoked for each user and source individually. The scalable architecture of the EEXCESS system handles up to 100 requests querying up to 10 sources in parallel without notable performance deterioration. The re-ranking and merging of results have a smaller influence on the system's responsiveness than the average source response rates. The EEXCESS recommender system offers a common entry point for knowledge workers to a variety of different sources with only marginally lower response times as the individual sources on their own. Hence, familiarisation with individual sources and their query language is not necessary.
2016

Pimas Oliver, Rexha Andi, Kröll Mark, Kern Roman

Profiling microblog authors using concreteness and sentiment - Know-Center at PAN 2016 author profiling

PAN 2016, Krisztian Balog, Linda Cappellato, Nicola Ferro, Craig Macdonald, Springer, Evora, Portugal, 2016

Konferenz
The PAN 2016 author profiling task is a supervised classification problemon cross-genre documents (tweets, blog and social media posts). Our systemmakes use of concreteness, sentiment and syntactic information present in thedocuments. We train a random forest model to identify gender and age of a document’sauthor. We report the evaluation results received by the shared task.
2016

Rexha Andi, Kröll Mark, Kern Roman

Social Media Monitoring for Companies: A 4W Summarisation Approach

European Conference on Knowledge Management, Dr. Sandra Moffett and Dr. Brendan Galbraith, Academic Conferences and Publishing International Limited, Belfast, Northern Ireland, UK, 2016

Konferenz
Monitoring (social) media represents one means for companies to gain access to knowledge about, for instance, competitors, products as well as markets. As a consequence, social media monitoring tools have been gaining attention to handle amounts of data nowadays generated in social media. These tools also include summarisation services. However, most summarisation algorithms tend to focus on (i) first and last sentences respectively or (ii) sentences containing keywords.In this work we approach the task of summarisation by extracting 4W (who, when, where, what) information from (social)media texts. Presenting 4W information allows for a more compact content representation than traditional summaries. Inaddition, we depart from mere named entity recognition (NER) techniques to answer these four question types by includingnon-rigid designators, i.e. expressions which do not refer to the same thing in all possible worlds such as “at the main square”or “leaders of political parties”. To do that, we employ dependency parsing to identify grammatical characteristics for each question type. Every sentence is then represented as a 4W block. We perform two different preliminary studies: selecting sentences that better summarise texts by achieving an F1-measure of 0.343, as well as a 4W block extraction for which we achieve F1-measures of 0.932; 0.900; 0.803; 0.861 for “who”, “when”, “where” and “what” category respectively. In a next step the 4W blocks are ranked by relevance. The top three ranked blocks, for example, then constitute a summary of the entire textual passage. The relevance metric can be customised to the user’s needs, for instance, ranked by up-to-dateness where the sentences’ tense is taken into account. In a user study we evaluate different ranking strategies including (i) up-todateness,(ii) text sentence rank, (iii) selecting the firsts and lasts sentences or (iv) coverage of named entities, i.e. based on the number of named entities in the sentence. Our 4W summarisation method presents a valuable addition to a company’s(social) media monitoring toolkit, thus supporting decision making processes.
2015

Wozelka Ralph, Kröll Mark, Sabol Vedran

Exploring Time Relations in Semantic Graphs

Proceedings of SIGRAD, SIGRAD, Linköping University Electronic Press, Stockholm, Sweden, 2015

Konferenz
The analysis of temporal relationships in large amounts of graph data has gained significance in recent years. In-formation providers such as journalists seek to bring order into their daily work when dealing with temporally dis-tributed events and the network of entities, such as persons, organisations or locations, which are related to these events. In this paper we introduce a time-oriented graph visualisation approach which maps temporal information to visual properties such as size, transparency and position and, combined with advanced graph navigation features, facilitates the identification and exploration of temporal relationships. To evaluate our visualisation, we compiled a dataset of ~120.000 news articles from international press agencies including Reuters, CNN, Spiegel and Aljazeera. Results from an early pilot study show the potentials of our visualisation approach and its usefulness for analysing temporal relationships in large data sets.
2015

Pimas Oliver, Kröll Mark, Kern Roman

Know-Center at PAN 2015 author identification

Lecture Notes in Computer Science, Working Notes Papers of the CLEF 2015 Evaluation Labs, Springer Link, Toulouse, France, 2015

Konferenz
Our system for the PAN 2015 authorship verification challenge is basedupon a two step pre-processing pipeline. In the first step we extract different fea-tures that observe stylometric properties, grammatical characteristics and purestatistical features. In the second step of our pre-processing we merge all thosefeatures into a single meta feature space. We train an SVM classifier on the gener-ated meta features to verify the authorship of an unseen text document. We reportthe results from the final evaluation as well as on the training datasets
2015

Rexha Andi, Klampfl Stefan, Kröll Mark, Kern Roman

Towards Authorship Attribution for Bibliometrics using Stylometric Features

Proc. of the Workshop Mining Scientific Papers: Computational Linguistics and Bibliometrics, Atanassova, I.; Bertin, M.; Mayr, P., ACL Anthology, Istanbul, Turkey, 2015

Konferenz
The overwhelming majority of scientific publications are authored by multiple persons; yet, bibliographic metrics are only assigned to individual articles as single entities. In this paper, we aim at a more fine-grained analysis of scientific authorship. We therefore adapt a text segmentation algorithm to identify potential author changes within the main text of a scientific article, which we obtain by using existing PDF extraction techniques. To capture stylistic changes in the text, we employ a number of stylometric features. We evaluate our approach on a small subset of PubMed articles consisting of an approximately equal number of research articles written by a varying number of authors. Our results indicate that the more authors an article has the more potential author changes are identified. These results can be considered as an initial step towards a more detailed analysis of scientific authorship, thereby extending the repertoire of bibliometrics.
2015

Kröll Mark, Strohmaier M.

Associating Intent with Sentiment in Weblogs

International Conference on Applications of Natural Language to Information Systems, NLDB'15, Springer-Verlag, Passau, Germany, 2015

Konferenz
People willingly provide more and more information about themselves on social media platforms. This personal information about users’ emotions (sentiment) or goals (intent) is particularly valuable, for instance, for monitoring tools. So far, sentiment and intent analysis were conducted separately. Yet, both aspects can complement each other thereby informing processes such as explanation and reasoning. In this paper, we investigate the relation between intent and sentiment in weblogs. We therefore extract ~90,000 human goal instances from the ICWSM 2009 Spinn3r dataset and assign respective sentiments. Our results indicate that associating intent with sentiment represents a valuable addition to research areas such as text analytics and text understanding.
2010

Kröll Mark, Strohmaier M.

Analyzing Human Intentions in Natural Language Text

The Fifth International Conference on Knowledge Capture (K-CAP'09), 2010

Konferenz
In this paper, we introduce the idea of Intent Analysis, which is to create a profile of the goals and intentions present in textual content. Intent Analysis, similar to Sentiment Analysis, represents a type of document classification that differs from traditional topic categorization by focusing on classification by intent. We investigate the extent to which the automatic analysis of human intentions in text is feasible and report our preliminary results, and discuss potential applications. Inaddition, we present results from a study that focused on evaluating intent profiles generated from transcripts of American presidential candidate speeches in 2008.
2009

Jeanquartier Fleur, Kröll Mark, Strohmaier M.

Intent Tag Clouds: An Intentional Approach To Visual Text Analysis

Proceedings of the Workshop on Semantic Multimedia Database Technologies, 10th International Workshop of the Multimedia Metadata Community (SeMuDaTe2009), CEUR Workshop Proceedings Volume 539, 2009

Konferenz
Getting a quick impression of the author's intention of a text is a task often performed. An author's intention plays a major role in successfully understanding a text. For supporting readers in this task, we present an intentional approach to visual text analysis, making use of tag clouds. The objectiveof tag clouds is presenting meta-information in a visually appealing way. However there is also much uncertainty associated with tag clouds, such as giving the wrong impression. It is not clear whether the author's intent can be grasped clearly while looking at a corresponding tag cloud. Therefore it is interesting to ask to what extent, with tag clouds, it is possible to support the user in understanding intentions expressed. In order to answer this question, we construct an intentional perspective on textual content. Based on an existing algorithm for extracting intent annotations from textual content we present a prototypical implementation to produce intent tag clouds, and describe a formative testing, illustrating how intent visualizations may support readers in understanding a text successfully. With the initial prototype, we conducted user studies of our intentional tag cloud visualization and a comparison with a traditional one that visualizes frequent terms. The evaluation's results indicate, that intent tag clouds have a positive effect on supporting users in grasping an author's intent.
2009

Kröll Mark

Studying Databases of Intentions: Do Search Query Logs Capture Knowledge about Common Human Goals?

The Fifth International Conference on Knowledge Capture (K-CAP'09), 2009

Konferenz
2009

Kröll Mark, Koerner C.

Automatically Annotating Textual Resources with Human Intentions

Hypertext 2009, 20th ACM Conference on Hypertext and Hypermedia (HT'09), 2009

Konferenz
2009

Kröll Mark, Strohmaier M.

Extracting Human Goals from Weblogs

Workshop on Knowledge Discovery, Data Mining and Machine Learning (KDML) 2009, 2009

Konferenz
2009

Körner C., Kröll Mark, Strohmaier M.

Intentional Query Suggestion: Making User Goals More Explicit During Search

Workshop on Web Search Click Data WSCD'09, 2009

Konferenz
2008

Granitzer Michael, Kröll Mark, Seifer Christin, Rath Andreas S., Weber Nicolas, Dietzel O., Lindstaedt Stefanie

Analysis of Machine Learning Techniques for Context Extraction

Proceedings of 2008 International Conference on Digital Information Management (ICDIM08), IEEE Computer Society Press, 2008

Konferenz
’Context is key’ conveys the importance of capturing thedigital environment of a knowledge worker. Knowing theuser’s context offers various possibilities for support, likefor example enhancing information delivery or providingwork guidance. Hence, user interactions have to be aggregatedand mapped to predefined task categories. Withoutmachine learning tools, such an assignment has to be donemanually. The identification of suitable machine learningalgorithms is necessary in order to ensure accurate andtimely classification of the user’s context without inducingadditional workload.This paper provides a methodology for recording user interactionsand an analysis of supervised classification models,feature types and feature selection for automatically detectingthe current task and context of a user. Our analysisis based on a real world data set and shows the applicabilityof machine learning techniques.
2008

Rath Andreas S., Weber Nicolas, Kröll Mark, Granitzer Michael, Dietzel O., Lindstaedt Stefanie

Context-Aware Knowledge Services

Workshop on Personal Information Management (PIM2008) at the 26th Computer Human Interaction Conference (CHI2008), Florence, Italy, 2008

Konferenz
Improving the productivity of knowledge workers is anopen research challenge. Our approach is based onproviding a large variety of knowledge services which takethe current work task and information need (work context)of the knowledge worker into account. In the following wepresent the DYONIPOS application which strives toautomatically identify a user’s work task and thencontextualizes different types of knowledge servicesaccordingly. These knowledge services then provideinformation (documents, people, locations) both from theuser’s personal as well as from the organizationalenvironment. The utility and functionality is illustratedalong a real world application scenario at the Ministry ofFinance in Austria.
2008

Strohmaier M., Prettenhofer P., Kröll Mark

Different Degrees of Explicitness in Intentional Artifacts - Studying User Goals in a Large Search Query Log

International Workshop on Agents and Data Mining Interaction ADMI'08, 2008

Konferenz
2007

Rath Andreas S., Kröll Mark, Lindstaedt Stefanie , Granitzer Michael

Low-Level Event Relationship Discovery for Knowledge Work Support

Proccedings of the 4th Conference on Professional Knowledge Management WM2007, ProKW2007, 28. - 30. März 2007, Potsdam, Germany, Gronau, N., GITO-Verlag, Berlin, 2007

Konferenz
2006

Granitzer Michael, Lindstaedt Stefanie , Tochtermann K., Kröll Mark, Rath Andreas S.

Contextual Retrieval in Knowledge Intensive Business Environments

Proceedings LWA 2006 - Lernen - Wissensentdeckung - Adaptivität, Hildesheim, Germany, October 9-11, 2006, Schaaf, M., Althoff, D., Universität Hildesheim, Hildesheim, 2006

Konferenz
Knowledge-intensive work plays an increasinglyimportant role in organisations of all types. Thiswork is characterized by a defined input and adefined output but not the way how to transformthe input to an output. Within this context, theresearch project DYONIPOS aims at encouragingthe two crucial roles in a knowledge-intensiveorganization - the process executer and the processengineer. Ad-hoc support will be providedfor the knowledge worker by synergizing the developmentof context sensitive, intelligent, andagile semantic technologies with contextual retrieval.DYONIPOS provides process executerswith guidance through business processes andjust-in-time resource support based on the currentuser context, that are the focus of this paper.
2006

Rath Andreas S., Kröll Mark, Andrews K., Lindstaedt Stefanie , Granitzer Michael

Synergizing Standard and Ad-Hoc Processes

Lecture Notes in Computer Science LNAI 4333, International Conference on Practical Aspects of Knowledge Management, Springer Berlin, Berlin Heidelberg, 2006

Konferenz
In a knowledge-intensive business environment, knowledgeworkers perform their tasks in highly creative ways. This essential freedomrequired by knowledge workers often conflicts with their organization’sneed for standardization, control, and transparency. Within thiscontext, the research project DYONIPOS aims to mitigate this contradictionby supporting the process engineer with insights into the processexecuter’s working behavior. These insights constitute the basis for balancedprocess modeling. DYONIPOS provides a process engineer supportenvironment with advanced process modeling services, such as processvisualization, standard process validation, and ad-hoc process analysisand optimization services.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close