Fernández Alonso, Miguel Yuste, Kern Roman
2018
Collection of environmental datasets recorded with Tinkerforge sensors and used in the development of a bachelor thesis on the topic of frequent pattern mining. The data was collected in several locations in the city of Graz, Austria, as well as an additional dataset recorded in Santander, Spain.
Gursch Heimo, Silva Nelson, Reiterer Bernhard , Paletta Lucas , Bernauer Patrick, Fuchs Martin, Veas Eduardo Enrique, Kern Roman
2018
The project Flexible Intralogistics for Future Factories (FlexIFF) investigates human-robot collaboration in intralogistics teams in the manufacturing industry, which form a cyber-physical system consisting of human workers, mobile manipulators, manufacturing machinery, and manufacturing information systems. The workers use Virtual Reality (VR) and Augmented Reality (AR) devices to interact with the robots and machinery. The right information at the right time is key for making this collaboration successful. Hence, task scheduling for mobile manipulators and human workers must be closely linked with the enterprise’s information systems, offering all actors on the shop floor a common view of the current manufacturing status. FlexIFF will provide useful, well-tested, and sophisticated solutions for cyberphysicals systems in intralogistics, with humans and robots making the most of their strengths, working collaboratively and helping each other.
Cuder Gerald, Breitfuß Gert, Kern Roman
2018
Electric vehicles have enjoyed a substantial growth in recent years. One essential part to ensure their success in the future is a well-developed and easy-to-use charging infrastructure. Since charging stations generate a lot of (big) data, gaining useful information out of this data can help to push the transition to E-Mobility. In a joint research project, the Know-Center, together with the has.to.be GmbH applied data analytics methods and visualization technologies on the provided data sets. One objective of the research project is, to provide a consumption forecast based on the historical consumption data. Based on this information, the operators of charging stations are able to optimize the energy supply. Additionally, the infrastructure data were analysed with regard to "predictive maintenance", aiming to optimize the availability of the charging stations. Furthermore, advanced prediction algorithms were applied to provide services to the end user regarding availability of charging stations.
Andrusyak Bohdan, Kugi Thomas, Kern Roman
2018
The stock and foreign exchange markets are the two fundamental financial markets in the world and play acrucial role in international business. This paper examines the possibility of predicting the foreign exchangemarket via machine learning techniques, taking the stock market into account. We compare prediction modelsbased on algorithms from the fields of shallow and deep learning. Our models of foreign exchange marketsbased on information from the stock market have been shown to be able to predict the future of foreignexchange markets with an accuracy of over 60%. This can be seen as an indicator of a strong link between thetwo markets. Our insights offer a chance of a better understanding guiding the future of market predictions.We found the accuracy depends on the time frame of the forecast and the algorithms used, where deeplearning tends to perform better for farther-reaching forecasts
Lovric Mario, Krebs Sarah, Cemernek David, Kern Roman
2018
The use of big data technologies has a deep impact on today’s research (Tetko et al., 2016) and industry (Li et al., n.d.), but also on public health (Khoury and Ioannidis, 2014) and economy (Einav and Levin, 2014). These technologies are particularly important for manufacturing sites, where complex processes are coupled with large amounts of data, for example in chemical and steel industry. This data originates from sensors, processes. and quality-testing. Typical application of these technologies is related to predictive maintenance and optimisation of production processes. Media makes the term “big data” a hot buzzword without going to deep into the topic. We noted a lack in user’s understanding of the technologies and techniques behind it, making the application of such technologies challenging. In practice the data is often unstructured (Gandomi and Haider, 2015) and a lot of resources are devoted to cleaning and preparation, but also to understanding causalities and relevance among features. The latter one requires domain knowledge, making big data projects not only challenging from a technical perspective, but also from a communication perspective. Therefore, there is a need to rethink the big data concept among researchers and manufacturing experts including topics like data quality, knowledge exchange and technology required. The scope of this presentation is to present the main pitfalls in applying big data technologies amongst users from industry, explain scaling principles in big data projects, and demonstrate common challenges in an industrial big data project
Santos Tiago, Kern Roman
2018
Semiconductor manufacturing processes critically depend on hundreds of highly complex process steps, which may cause critical deviations in the end-product.Hence, a better understanding of wafer test data patterns, which represent stress tests conducted on devices in semiconductor material slices, may lead to an improved production process.However, the shapes and types of these wafer patterns, as well as their relation to single process steps, are unknown.In a first step to address these issues, we tailor and apply a variational auto-encoder (VAE) to wafer pattern images.We find the VAE's generator allows for explorative wafer pattern analysis, andits encoder provides an effective dimensionality reduction algorithm, which, in a clustering application, performs better than several baselines such as t-SNE and yields interpretable clusters of wafer patterns.
Urak Günter, Ziak Hermann, Kern Roman
2018
The task of federated search is to combine results from multiple knowledge bases into a single, aggregated result list, where the items typically range from textual documents toimages. These knowledge bases are also called sources, and the process of choosing the actual subset of sources for a given query is called source selection. A scenario wherethese sources do not provide information about their content in a standardized way is called uncooperative setting. In our work we focus on knowledge bases providing long tail content, i.e., rather specialized sources offering a low number of relevant documents. These sources are often neglected in favor of more popular knowledge sources, both by today’s Web users as well as by most of the existing source selection techniques. We propose a system for source selection which i) could be utilized to automatically detect long tail knowledge bases and ii) generates aggregated search results that tend to incorporate results from these long tail sources. Starting from the current state-of-the-art we developed components that allowed to adjust the amount of contribution from long tail sources. Our evaluation is conducted on theTREC 2014 Federated WebSearch dataset. As this dataset also favors the most popular sources, systems that include many long tail knowledge bases will yield low performancemeasures. Here, we propose a system where just a few relevant long tail sources are integrated into the list of more popular knowledge bases. Additionally, we evaluated the implications of an uncooperative setting, where only minimal information of the sources is available to the federated search system. Here a severe drop in performance is observed once the share of long tail sources is higher than 40%. Our work is intended to steer the development of federated search systems that aim at increasing the diversity and coverage of the aggregated search result.
Rexha Andi, Kröll Mark, Ziak Hermann, Kern Roman
2018
The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e., authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors.Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper.Both studies confirmed that this task is quite challenging.To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (i) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (ii) assist forensic experts or linguists to create profiles of writers, (iii) support intelligence applications to analyze aggressive and threatening messages and (iv) help editor conformity by adhering to, for instance, journal specific writing style.
Bassa Akim, Kröll Mark, Kern Roman
2018
Open Information Extraction (OIE) is the task of extracting relations fromtext without the need of domain speci c training data. Currently, most of the researchon OIE is devoted to the English language, but little or no research has been conductedon other languages including German. We tackled this problem and present GerIE, anOIE parser for the German language. Therefore we started by surveying the availableliterature on OIE with a focus on concepts, which may also apply to the Germanlanguage. Our system is built upon the output of a dependency parser, on which anumber of hand crafted rules are executed. For the evaluation we created two dedicateddatasets, one derived from news articles and one based on texts from an encyclopedia.Our system achieves F-measures of up to 0.89 for sentences that have been correctlypreprocessed.