Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2018

Bassa Akim, Kröll Mark, Kern Roman

GerIE - An Open InformationExtraction System for the German Language

Journal of Universal Computer Science, 2018

Journal
Open Information Extraction (OIE) is the task of extracting relations fromtext without the need of domain speci c training data. Currently, most of the researchon OIE is devoted to the English language, but little or no research has been conductedon other languages including German. We tackled this problem and present GerIE, anOIE parser for the German language. Therefore we started by surveying the availableliterature on OIE with a focus on concepts, which may also apply to the Germanlanguage. Our system is built upon the output of a dependency parser, on which anumber of hand crafted rules are executed. For the evaluation we created two dedicateddatasets, one derived from news articles and one based on texts from an encyclopedia.Our system achieves F-measures of up to 0.89 for sentences that have been correctlypreprocessed.
2018

Bassa Kevin, Kern Roman, Kröll Mark

On-the-fly Data Set Generation for Single Fact Validation

SAC 2018, 2018

Konferenz
On the web, massive amounts of information are available, includingwrong (or conflicting) information. This spreading of erroneous or fake contentsmakes it hard for users to distinguish between what is true and what is not. Factfinding algorithms represent a means to validate information. Yet, these algorithmsrequire an already existing, structured data set to validate a single fact; anad-hoc validation is thus not supported making them impractical for usage in realworld applications. This work presents an approach to generate these data setson-the-fly. For three facts, we generate respective data sets and apply six state-ofthe-art fact finding algorithms for evaluation purposes. In addition, our approachcontributes to comparing fact finding algorithms in a more objective way.
2018

Rexha Andi, Kröll Mark, Kern Roman

Multilingual Open Information Extraction using Parallel Corpora: The German Language Case

ACM Symposium on Applied Computing , Hisham M. Haddad, Roger L. Wainwright, ACM, 2018

Konferenz
In the past decade the research community has been continuously improving theextraction quality of Open Information Extraction systems. This was done mainlyfor the English language; other languages such as German or Spanish followedusing shallow or deep parsing information to derive language-specific patterns.More recent efforts focused on language agnostic approaches in an attempt tobecome less dependent on available tools and resources in that language. In linewith these efforts, we present a language agnostic approach which exploitsmanually aligned corpora as well as the solid performance of English OpenIEtools.
2018

Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman

Authorship Identification of Documents with High Content Similarity

Scientometrics, Wolfgang Glänzel, Springer Link, 2018

Journal
The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e., authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors.Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper.Both studies confirmed that this task is quite challenging.To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (i) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (ii) assist forensic experts or linguists to create profiles of writers, (iii) support intelligence applications to analyze aggressive and threatening messages and (iv) help editor conformity by adhering to, for instance, journal specific writing style.
2018

Hojas Sebastian, Kröll Mark, Kern Roman

GerMeter - A Corpus for Measuring Text Reuse in the Austrian JournalisticDomain

Language Resources and Evaluation, Springer, 2018

Journal
2018

Urak Günter, Ziak Hermann, Kern Roman

Source Selection of Long Tail Sources for Federated Search in an Uncooperative Setting

SAC, 2018

Konferenz
The task of federated search is to combine results from multiple knowledge bases into a single, aggregated result list, where the items typically range from textual documents toimages. These knowledge bases are also called sources, and the process of choosing the actual subset of sources for a given query is called source selection. A scenario wherethese sources do not provide information about their content in a standardized way is called uncooperative setting. In our work we focus on knowledge bases providing long tail content, i.e., rather specialized sources offering a low number of relevant documents. These sources are often neglected in favor of more popular knowledge sources, both by today’s Web users as well as by most of the existing source selection techniques. We propose a system for source selection which i) could be utilized to automatically detect long tail knowledge bases and ii) generates aggregated search results that tend to incorporate results from these long tail sources. Starting from the current state-of-the-art we developed components that allowed to adjust the amount of contribution from long tail sources. Our evaluation is conducted on theTREC 2014 Federated WebSearch dataset. As this dataset also favors the most popular sources, systems that include many long tail knowledge bases will yield low performancemeasures. Here, we propose a system where just a few relevant long tail sources are integrated into the list of more popular knowledge bases. Additionally, we evaluated the implications of an uncooperative setting, where only minimal information of the sources is available to the federated search system. Here a severe drop in performance is observed once the share of long tail sources is higher than 40%. Our work is intended to steer the development of federated search systems that aim at increasing the diversity and coverage of the aggregated search result.
2018

Santos Tiago, Kern Roman

Understanding semiconductor production with variational auto-encoders

European Symposium on Artificial Neural Network (ESANN) 2018, 2018

Konferenz
Semiconductor manufacturing processes critically depend on hundreds of highly complex process steps, which may cause critical deviations in the end-product.Hence, a better understanding of wafer test data patterns, which represent stress tests conducted on devices in semiconductor material slices, may lead to an improved production process.However, the shapes and types of these wafer patterns, as well as their relation to single process steps, are unknown.In a first step to address these issues, we tailor and apply a variational auto-encoder (VAE) to wafer pattern images.We find the VAE's generator allows for explorative wafer pattern analysis, andits encoder provides an effective dimensionality reduction algorithm, which, in a clustering application, performs better than several baselines such as t-SNE and yields interpretable clusters of wafer patterns.
2018

Lovric Mario, Krebs Sarah, Cemernek David, Kern Roman

BIG DATA IN INDUSTRIAL APPLICATION

XII Meeting of Young Chemical Engineers, Zagreb, Kroatien, 2018

Konferenz
The use of big data technologies has a deep impact on today’s research (Tetko et al., 2016) and industry (Li et al., n.d.), but also on public health (Khoury and Ioannidis, 2014) and economy (Einav and Levin, 2014). These technologies are particularly important for manufacturing sites, where complex processes are coupled with large amounts of data, for example in chemical and steel industry. This data originates from sensors, processes. and quality-testing. Typical application of these technologies is related to predictive maintenance and optimisation of production processes. Media makes the term “big data” a hot buzzword without going to deep into the topic. We noted a lack in user’s understanding of the technologies and techniques behind it, making the application of such technologies challenging. In practice the data is often unstructured (Gandomi and Haider, 2015) and a lot of resources are devoted to cleaning and preparation, but also to understanding causalities and relevance among features. The latter one requires domain knowledge, making big data projects not only challenging from a technical perspective, but also from a communication perspective. Therefore, there is a need to rethink the big data concept among researchers and manufacturing experts including topics like data quality, knowledge exchange and technology required. The scope of this presentation is to present the main pitfalls in applying big data technologies amongst users from industry, explain scaling principles in big data projects, and demonstrate common challenges in an industrial big data project
2018

Santos Tiago, Walk Simon, Kern Roman, Strohmaier M., Helic Denis

Activity in Questions & Answers Websites

ACM Transactions on Social Computing, 2018

Journal
Millions of users on the Internet discuss a variety of topics on Question and Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve self-sustaining levels of activity while others fail to attract users and either never grow beyond being a small niche community or become inactive. Hence, it is imperative to not only better understand but also to distill deciding factors and rules that define and govern sustainable Q&A instances. We aim to empower community managers with quantitative methods for them to better understand, control and foster their communities, and thus contribute to making the Web a more efficient place to exchange information. To that end, we extract, model and cluster user activity-based time series from 50 randomly selected Q&A instances from the StackExchange network to characterize user behavior. We find four distinct types of user activity temporal patterns, which vary primarily according to the users' activity frequency. Finally, by breaking down total activity in our 50 Q&A instances by the previously identified user activity profiles, we classify those 50 Q&A instances into three different activity profiles. Our categorization of Q&A instances aligns with the stage of development and maturity of the underlying communities, which can potentially help operators of such instances not only to quantitatively assess status and progress, but also allow them to optimize community building efforts
2018

Santos Tiago, Walk Simon, Kern Roman, Helic Denis

Evolution of Collaborative Web Communities

ACM Hypertext 2018, 2018

Konferenz
Each day, millions of users visit collaborative Web communities, such as Wikipedia or StackExchange, either as large knowledge repositories or as up-to-date news sources.However, not all of Web communities are as successful as Wikipedia and, except for a few initial research results, our research community still knows only a little about what separates a successful from an unsuccessful community.Thus, we still need to (i) gain a better understanding of the underlying community evolution dynamics, and (ii) based on this understanding support activity and growth on such platforms.To that end, we distill temporal dynamics of community activity and thereby identify key factors leading to success or failure of communities.In particular, we study the differences between growing and declining communities by leveraging multivariate Hawkes processes. Furthermore, we compare communities hosted on different platforms such as StackExchange and Reddit, as well as topically diverse communities such as STEM and humanities.We find that all growing communities exhibit (i) an active core of power users reacting to the community as a whole, and (ii) numerous casual users strongly interacting with other casual users suggesting community openness towards less active users.Moreover, our results suggest that communities in the humanities are centered around power users, whereas in STEM communities activity is more evenly distributed among power and casual users.These results are of practical importance for community managers to quantitatively assess the status of their communities and guide them towards thriving community structures
2018

Andrusyak Bohdan, Kugi Thomas, Kern Roman

Daily Prediction of Foreign Exchange Rates Based on the Stock Marke

Proceedings of the PEFNet 2017 conference, Jana Stávková, Mendel University Press, Brno, 2018

Konferenz
The stock and foreign exchange markets are the two fundamental financial markets in the world and play acrucial role in international business. This paper examines the possibility of predicting the foreign exchangemarket via machine learning techniques, taking the stock market into account. We compare prediction modelsbased on algorithms from the fields of shallow and deep learning. Our models of foreign exchange marketsbased on information from the stock market have been shown to be able to predict the future of foreignexchange markets with an accuracy of over 60%. This can be seen as an indicator of a strong link between thetwo markets. Our insights offer a chance of a better understanding guiding the future of market predictions.We found the accuracy depends on the time frame of the forecast and the algorithms used, where deeplearning tends to perform better for farther-reaching forecasts
2018

Cuder Gerald, Breitfuß Gert, Kern Roman

E-Mobility and Big Data - Data Utilization of Charging Operations

Proceedings of XXIX ISPIM Conference, Stockholm, 2018

Konferenz
Electric vehicles have enjoyed a substantial growth in recent years. One essential part to ensure their success in the future is a well-developed and easy-to-use charging infrastructure. Since charging stations generate a lot of (big) data, gaining useful information out of this data can help to push the transition to E-Mobility. In a joint research project, the Know-Center, together with the has.to.be GmbH applied data analytics methods and visualization technologies on the provided data sets. One objective of the research project is, to provide a consumption forecast based on the historical consumption data. Based on this information, the operators of charging stations are able to optimize the energy supply. Additionally, the infrastructure data were analysed with regard to "predictive maintenance", aiming to optimize the availability of the charging stations. Furthermore, advanced prediction algorithms were applied to provide services to the end user regarding availability of charging stations.
2018

Gursch Heimo, Silva Nelson, Reiterer Bernhard , Paletta Lucas , Bernauer Patrick, Fuchs Martin, Veas Eduardo Enrique, Kern Roman

Flexible Scheduling for Human Robot Collaboration in Intralogistics Teams

Mensch und Computer 2018, Gesellschaft für Informatik e.V., Gesellschaft für Informatik e.V., Bonn, Germany, 2018

Konferenz
The project Flexible Intralogistics for Future Factories (FlexIFF) investigates human-robot collaboration in intralogistics teams in the manufacturing industry, which form a cyber-physical system consisting of human workers, mobile manipulators, manufacturing machinery, and manufacturing information systems. The workers use Virtual Reality (VR) and Augmented Reality (AR) devices to interact with the robots and machinery. The right information at the right time is key for making this collaboration successful. Hence, task scheduling for mobile manipulators and human workers must be closely linked with the enterprise’s information systems, offering all actors on the shop floor a common view of the current manufacturing status. FlexIFF will provide useful, well-tested, and sophisticated solutions for cyberphysicals systems in intralogistics, with humans and robots making the most of their strengths, working collaboratively and helping each other.
2018

Fernández Alonso, Miguel Yuste, Kern Roman

Tinkerforge environmental datasets

2018

Collection of environmental datasets recorded with Tinkerforge sensors and used in the development of a bachelor thesis on the topic of frequent pattern mining. The data was collected in several locations in the city of Graz, Austria, as well as an additional dataset recorded in Santander, Spain.
2018

Schrunner Stefan, Bluder Olivia, Zernig Anja, Kaestner Andre, Kern Roman

A Comparison of Supervised Approaches for Process Pattern Recognition in Analog Semiconductor Wafer Test Data

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018

Konferenz
The semiconductor industry is currently leveragingto exploit machine learning techniques to improve and automate the manufacturing process. An essential step is the wafer test, where each single device is measured electrically, resulting in an image of the wafer. Our work is based on the hypothesis that deviations of production processes can be detected via spatial patterns on these wafermaps. Supervised learning methods are one possibility to recognize such patterns in an automated way - however, the training sample size is very low. In our work, we present and compare several methods for multiclass classification, which can deal with this limitation: multiclass decision trees, as well as decomposition methods like round robin and error- correcting output coding (ECOC). As elementary classifiers, we compare binary decision trees and logistic regression using an elastic net regularization. The evaluation shows that the decomposition methods outperform the multiclass decision tree regarding both, accuracy and practical demands.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close