Publikationen - Know Center - KI & Data Science

Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman

2017

Pilot study: Ranking of textual snippets based on the writing style

Zenodo

In this pilot study, we tried to capture humans' behavior when identifying authorship of text snippets. At first, we selected textual snippets from the introduction of scientific articles written by single authors. Later, we presented to the evaluators a source and four target snippets, and then, ask them to rank the target snippets from the most to the least similar from the writing style.The dataset is composed by 66 experiments manually checked for not having any clear hint during the ranking for the evaluators. For each experiment, we have evaluations from three different evaluators.We present each experiment in a single line (in the CSV file), where, at first we present the metadata of the Source-Article (Journal, Title, Authorship, Snippet), and the metadata for the 4 target snippets (Journal, Title, Authorship, Snippet, Written From the same Author, Published in the same Journal) and the ranking given by each evaluator. This task was performed in the open source platform, Crowd Flower. The headers of the CSV are self-explained. In the TXT file, you can find a human-readable version of the experiment. For more information about the extraction of the data, please consider reading our paper: "Extending Scientific Literature Search by Including the Author’s Writing Style" @BIR: http://www.gesis.org/en/services/events/events-archive/conferences/ecir-workshops/ecir-workshop-2017

Dragoni Mauro, Federici Marco, Rexha Andi

2017

Extracting Aspects From User-generated Content For Supporting Opinion Mining Systems

Journal of Intelligent Information Systems Kerschberg; Z. Ras Springer

One of the most important opinion mining research directions falls in the extraction ofpolarities referring to specific entities (aspects) contained in the analyzed texts. The detectionof such aspects may be very critical especially when documents come from unknowndomains. Indeed, while in some contexts it is possible to train domain-specificmodels for improving the effectiveness of aspects extraction algorithms, in others themost suitable solution is to apply unsupervised techniques by making such algorithmsdomain-independent. Moreover, an emerging need is to exploit the results of aspectbasedanalysis for triggering actions based on these data. This led to the necessityof providing solutions supporting both an effective analysis of user-generated contentand an efficient and intuitive way of visualizing collected data. In this work, we implementedan opinion monitoring service implementing (i) a set of unsupervised strategiesfor aspect-based opinion mining together with (ii) a monitoring tool supporting usersin visualizing analyzed data. The aspect extraction strategies are based on the use of semanticresources for performing the extraction of aspects from texts. The effectivenessof the platform has been tested on benchmarks provided by the SemEval campaign and have been compared with the results obtained by domain-adapted techniques.

Kern Roman, Falk Stefan, Rexha Andi

2017

Know-Center at SemEval-2017 Task 10: Sequence Classification with the CODE Annotator

Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017) Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum ACL Vancouver, Canada

This paper describes our participation inSemEval-2017 Task 10, named ScienceIE(Machine Reading for Scientist). We competedin Subtask 1 and 2 which consist respectivelyin identifying all the key phrasesin scientific publications and label them withone of the three categories: Task, Process,and Material. These scientific publicationsare selected from Computer Science, MaterialSciences, and Physics domains. We followeda supervised approach for both subtasksby using a sequential classifier (CRF - ConditionalRandom Fields). For generating oursolution we used a web-based application implementedin the EU-funded research project,named CODE. Our system achieved an F1score of 0.39 for the Subtask 1 and 0.28 forthe Subtask 2.

Rexha Andi, Kern Roman, Ziak Hermann, Dragoni Mauro

2017

A semantic federated search engine for domain-specific document retrieval

SAC '17 Proceedings of the Symposium on Applied Computing Sung Y. Shin, Dongwan Shin, Maria Lencastre ACM Marrakech, Morocco

Retrieval of domain-specific documents became attractive for theSemantic Web community due to the possibility of integrating classicInformation Retrieval (IR) techniques with semantic knowledge.Unfortunately, the gap between the construction of a full semanticsearch engine and the possibility of exploiting a repository ofontologies covering all possible domains is far from being filled.Recent solutions focused on the aggregation of different domain-specificrepositories managed by third-parties. In this paper, wepresent a semantic federated search engine developed in the contextof the EEXCESS EU project. Through the developed platform,users are able to perform federated queries over repositories in atransparent way, i.e. without knowing how their original queries aretransformed before being actually submitted. The platform implementsa facility for plugging new repositories and for creating, withthe support of general purpose knowledge bases, knowledge graphsdescribing the content of each connected repository. Such knowledgegraphs are then exploited for enriching queries performed byusers.

Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman

2017

Extending Scientific Literature Search by Including the Author’s Writing Style

Fifth Workshop on Bibliometric-enhanced Information Retrieval Atanassova, I.; Bertin, M.; Mayr, P. Springer Aberdeen, UK

Our work is motivated by the idea to extend the retrieval of related scientific literature to cases, where the relatedness also incorporates the writing style of individual scientific authors. Therefore we conducted a pilot study to answer the question whether humans can identity authorship once the topological clues have been removed. As first result, we found out that this task is challenging, even for humans. We also found some agreement between the annotators. To gain a better understanding how humans tackle such a problem, we conducted an exploratory data analysis. Here, we compared the decisions against a number of topological and stylometric features. The outcome of our work should help to improve automatic authorship identificationalgorithms and to shape potential follow-up studies.