Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2018

Bassa Akim, Kröll Mark, Kern Roman

GerIE - An Open InformationExtraction System for the German Language

Journal of Universal Computer Science, 2018

Journal
Open Information Extraction (OIE) is the task of extracting relations fromtext without the need of domain speci c training data. Currently, most of the researchon OIE is devoted to the English language, but little or no research has been conductedon other languages including German. We tackled this problem and present GerIE, anOIE parser for the German language. Therefore we started by surveying the availableliterature on OIE with a focus on concepts, which may also apply to the Germanlanguage. Our system is built upon the output of a dependency parser, on which anumber of hand crafted rules are executed. For the evaluation we created two dedicateddatasets, one derived from news articles and one based on texts from an encyclopedia.Our system achieves F-measures of up to 0.89 for sentences that have been correctlypreprocessed.
2018

Bassa Kevin, Kern Roman, Kröll Mark

On-the-fly Data Set Generation for Single Fact Validation

SAC 2018, 2018

Konferenz
On the web, massive amounts of information are available, includingwrong (or conflicting) information. This spreading of erroneous or fake contentsmakes it hard for users to distinguish between what is true and what is not. Factfinding algorithms represent a means to validate information. Yet, these algorithmsrequire an already existing, structured data set to validate a single fact; anad-hoc validation is thus not supported making them impractical for usage in realworld applications. This work presents an approach to generate these data setson-the-fly. For three facts, we generate respective data sets and apply six state-ofthe-art fact finding algorithms for evaluation purposes. In addition, our approachcontributes to comparing fact finding algorithms in a more objective way.
2018

Rexha Andi, Kröll Mark, Kern Roman

Multilingual Open Information Extraction using Parallel Corpora: The German Language Case

ACM Symposium on Applied Computing , Hisham M. Haddad, Roger L. Wainwright, ACM, 2018

Konferenz
In the past decade the research community has been continuously improving theextraction quality of Open Information Extraction systems. This was done mainlyfor the English language; other languages such as German or Spanish followedusing shallow or deep parsing information to derive language-specific patterns.More recent efforts focused on language agnostic approaches in an attempt tobecome less dependent on available tools and resources in that language. In linewith these efforts, we present a language agnostic approach which exploitsmanually aligned corpora as well as the solid performance of English OpenIEtools.
2018

Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman

Authorship Identification of Documents with High Content Similarity

Scientometrics, Wolfgang Glänzel, Springer Link, 2018

Journal
The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e., authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors.Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper.Both studies confirmed that this task is quite challenging.To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (i) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (ii) assist forensic experts or linguists to create profiles of writers, (iii) support intelligence applications to analyze aggressive and threatening messages and (iv) help editor conformity by adhering to, for instance, journal specific writing style.
2018

Hojas Sebastian, Kröll Mark, Kern Roman

GerMeter - A Corpus for Measuring Text Reuse in the Austrian JournalisticDomain

Language Resources and Evaluation, Springer, 2018

Journal
2018

Rexha Andi, Kröll Mark, Kern Roman, Dragoni Mauro

The CLAUSY System at ESWC-2018 Challenge on Semantic Sentiment Analysis

Springer, 2018

Konferenz
With different social media and commercial platforms, users express their opinion about products in a textual form. Automatically extracting the polarity(i.e. whether the opinion is positive or negative) of a user can be useful for both actors: the online platform incorporating the feedback to improve their product as well as the client who might get recommendations according to his or her preferences. Different approaches for tackling the problem, have been suggested mainly using syntactic features. The “Challenge on Semantic Sentiment Analysis” aims to go beyond the word-level analysis by using semantic information. In this paper we propose a novel approach by employing the semantic information of grammatical unit called preposition. We try to derive the target of the review from the summary information, which serves as an input to identify the proposition in it. Our implementation relies on the hypothesis that the proposition expressing the target of the summary, usually containing the main polarity information.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close