Publikationen - Know Center - KI & Data Science

Stanisavljevic Darko, Cemernek David, Gursch Heimo, Urak Günter, Lechner Gernot

2019

Detection of Interferences in an Additive Manufacturing Process: An Experimental Study Integrating Methods of Feature Selection and Machine Learning

International Journal of Production Research Taylor & Francis Londo

Additive manufacturing becomes a more and more important technology for production, mainly driven by the ability to realise extremely complex structures using multiple materials but without assembly or excessive waste. Nevertheless, like any high-precision technology additive manufacturing responds to interferences during the manufacturing process. These interferences – like vibrations – might lead to deviations in product quality, becoming manifest for instance in a reduced lifetime of a product or application issues. This study targets the issue of detecting such interferences during a manufacturing process in an exemplary experimental setup. Collection of data using current sensor technology directly on a 3D-printer enables a quantitative detection of interferences. The evaluation provides insights into the effectiveness of the realised application-oriented setup, the effort required for equipping a manufacturing system with sensors, and the effort for acquisition and processing the data. These insights are of practical utility for organisations dealing with additive manufacturing: the chosen approach for detecting interferences shows promising results, reaching interference detection rates of up to 100% depending on the applied data processing configuration.

Santos Tiago, Schrunner Stefan, Geiger Bernhard, Pfeiler Olivia, Zernig Anja, Kaestner Andre, Kern Roman

2019

Feature Extraction From Analog Wafermaps: A Comparison of Classical Image Processing and a Deep Generative Model

IEEE Transactions on Semiconductor Manufacturing IEEE IEEE

Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this paper, we go one step further and analyze test data of devices still within the limits of the specification, by exploiting the information contained in the analog wafermaps. To that end, we propose two feature extraction approaches with the aim to detect patterns in the wafer test dataset. Such patterns might indicate the onset of critical deviations in the production process. The studied approaches are: 1) classical image processing and restoration techniques in combination with sophisticated feature engineering and 2) a data-driven deep generative model. The two approaches are evaluated on both a synthetic and a real-world dataset. The synthetic dataset has been modeled based on real-world patterns and characteristics. We found both approaches to provide similar overall evaluation metrics. Our in-depth analysis helps to choose one approach over the other depending on data availability as a major aspect, as well as on available computing power and required interpretability of the results.

Silva Nelson, Madureira, Luis

2019

SUPPLY CHAIN INTELLIGENCE E A ANÁLISE ESTRATÉGICA DE RELAÇÕES COMPLEXAS

Supply Chain Magazine Luis Filipe

Uncover hidden suppliers and their complex relationships across the entire Supply Chain is quite complex. Unexpected disruptions, e.g. earthquakes, volcanoes, bankruptcies or nuclear disasters have a huge impact on major Supply Chain strategies. It is very difficult to predict the real impact of these disruptions until it is too late. Small, unknown suppliers can hugely impact the delivery of a product. Therefore, it is crucial to constantly monitor for problems with both direct and indirect suppliers.

di Sciascio Maria Cecilia, Strohmaier David, Errecalde Marcelo Luis, Veas Eduardo Enrique

2019

Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia

ACM Transactions on Interactive Intelligent System ACM

Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores.

di Sciascio Maria Cecilia, Brusilovsky Peter, Trattner Christoph, Veas Eduardo Enrique

2019

A Roadmap to User-Controllable Social Exploratory Search

ACM Transactions on Interactive Intelligent System ACM

Information-seeking tasks with learning or investigative purposes are usually referred to as exploratory search. Exploratory search unfolds as a dynamic process where the user, amidst navigation, trial and error, and on-the-fly selections, gathers and organizes information (resources). A range of innovative interfaces with increased user control has been developed to support the exploratory search process. In this work, we present our attempt to increase the power of exploratory search interfaces by using ideas of social search—for instance, leveraging information left by past users of information systems. Social search technologies are highly popular today, especially for improving ranking. However, current approaches to social ranking do not allow users to decide to what extent social information should be taken into account for result ranking. This article presents an interface that integrates social search functionality into an exploratory search system in a user-controlled way that is consistent with the nature of exploratory search. The interface incorporates control features that allow the user to (i) express information needs by selecting keywords and (ii) to express preferences for incorporating social wisdom based on tag matching and user similarity. The interface promotes search transparency through color-coded stacked bars and rich tooltips. This work presents the full series of evaluations conducted to, first, assess the value of the social models in contexts independent to the user interface, in terms of objective and perceived accuracy. Then, in a study with the full-fledged system, we investigated system accuracy and subjective aspects with a structural model revealing that when users actively interacted with all of its control features, the hybrid system outperformed a baseline content-based–only tool and users were more satisfied.

Geiger Bernhard, Koch Tobias

2019

On the Information Dimension of Stochastic Processes

IEEE Transactions on Information Theory IEEE IEEE nicht angegeben

In 1959, Rényi proposed the information dimension and the d-dimensional entropy to measure the information content of general random variables. This paper proposes a generalization of information dimension to stochastic processes by defining the information dimension rate as the entropy rate of the uniformly quantized stochastic process divided by minus the logarithm of the quantizer step size 1/m in the limit as m → ∞. It is demonstrated that the information dimension rate coincides with the rate-distortion dimension, defined as twice the rate-distortion function R(D) of the stochastic process divided by - log(D) in the limit as D ↓ 0. It is further shown that among all multivariate stationary processes with a given (matrixvalued) spectral distribution function (SDF), the Gaussian process has the largest information dimension rate and the information dimension rate of multivariate stationary Gaussian processes is given by the average rank of the derivative of the SDF. The presented results reveal that the fundamental limits of almost zero-distortion recovery via compressible signal pursuit and almost lossless analog compression are different in general.

Jorge Guerra Torres, Carlos Catania, Veas Eduardo Enrique

2019

Active learning approach to label network traffic datasets

Journal of Information Security and Applications Elsevier Elsevier

Modern Network Intrusion Detection systems depend on models trained with up-to-date labeled data. Yet, the process of labeling a network traffic dataset is specially expensive, since expert knowledge is required to perform the annotations. Visual analytics applications exist that claim to considerably reduce the labeling effort, but the expert still needs to ponder several factors before issuing a label. And, most often the effect of bad labels (noise) in the final model is not evaluated. The present article introduces a novel active learning strategy that learns to predict labels in (pseudo) real-time as the user performs the annotation. The system called RiskID, presents several innovations: i) a set of statistical methods summarize the information, which is illustrated in a visual analytics application, ii) that interfaces with the active learning strategy forbuilding a random forest model as the user issues annotations; iii) the (pseudo) real-time predictions of the model are fed back visually to scaffold the traffic annotation task. Finally, iv) an evaluation framework is introduced that represents a complete methodology for evaluating active learning solutions, including resilience against noise.

Barreiros Carla, Pammer-Schindler Viktoria, Veas Eduardo Enrique

2019

Planting the Seed of Positive Human-IoT Interaction

International Journal of Human–Computer Interaction Taylor and Francis

We present a visual interface for communicating the internal state of a coffee machine via a tree metaphor. Nature-inspired representations have a positive impact on human well-being. We also hypothesize that representing the coffee machine asa tree stimulates emotional connection to it, which leads to better maintenance performance.The first study assessed the understandability of the tree representation, comparing it with icon-based and chart-based representations. An online survey with 25 participants indicated no significant mean error difference between representations.A two-week field study assessed the maintenance performance of 12 participants, comparing the tree representation with the icon-based representation. Based on 240 interactions with the coffee machine, we concluded that participants understood themachine states significantly better in the tree representation. Their comments and behavior indicated that the tree representation encouraged an emotional engagement with the machine. Moreover, the participants performed significantly more optional maintenance tasks with the tree representation.

Toller Maximilian, Santos Tiago, Kern Roman

2019

SAZED: parameter-free domain-agnostic season length estimation in time series data

Data Mining and Knowledge Discovery Springer US

Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of O(𝑛log𝑛), which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.

Stepputat Kendra, Kienreich Wolfgang, Dick Christopher S.

2019

Digital Methods in Intangible Cultural Heritage Research: A Case Study in Tango Argentino

Journal on Computing and Cultural Heritage (JOCCH) ACM ACM New York, NY, USA

With this article, we present the ongoing research project “Tango Danceability of Music in European Perspective” and the transdisciplinary research design it is built upon. Three main aspects of tango argentino are in focus—the music, the dance, and the people—in order to understand what is considered danceable in tango music. The study of all three parts involves computer-aided analysis approaches, and the results are examined within ethnochoreological and ethnomusicological frameworks. Two approaches are illustrated in detail to show initial results of the research model. Network analysis based on the collection of online tango event data and quantitative evaluation of data gathered by an online survey showed significant results, corroborating the hypothesis of gatekeeping effects in the shaping of musical preferences. The experiment design includes incorporation of motion capture technology into dance research. We demonstrate certain advantages of transdisciplinary approaches in the study of Intangible Cultural Heritage, in contrast to conventional studies based on methods from just one academic discipline.

Adolfo Ruiz Calleja, Dennerlein Sebastian, Kowald Dominik, Theiler Dieter, Lex Elisabeth, Tobias Ley

2019

An Infrastructure for Workplace Learning Analytics: Tracing Knowledge Creation with the Social Semantic Server

Journal of Learning Analytics Society for Learning Analytics Research (SoLAR) UTS ePress

In this paper, we propose the Social Semantic Server (SSS) as a service-based infrastructure for workplace andprofessional Learning Analytics (LA). The design and development of the SSS has evolved over 8 years, startingwith an analysis of workplace learning inspired by knowledge creation theories and its application in differentcontexts. The SSS collects data from workplace learning tools, integrates it into a common data model based ona semantically-enriched Artifact-Actor Network and offers it back for LA applications to exploit the data. Further,the SSS design promotes its flexibility in order to be adapted to different workplace learning situations. Thispaper contributes by systematizing the derivation of requirements for the SSS according to the knowledge creationtheories, and the support offered across a number of different learning tools and LA applications integrated to it.It also shows evidence for the usefulness of the SSS extracted from four authentic workplace learning situationsinvolving 57 participants. The evaluation results indicate that the SSS satisfactorily supports decision making indiverse workplace learning situations and allow us to reflect on the importance of the knowledge creation theoriesfor such analysis.

Renner Bettina, Wesiak Gudrun, Pammer-Schindler Viktoria, Prilla Michael, Müller Lars, Morosini Dalia, Mora Simone, Faltin Nils, Cress Ulrike

2019

Computer-supported reflective learning: How apps can foster reflection at work.

Behaviour & Information Technology Taylor & Francis Taylor & Francis

Fruhwirth Michael, Breitfuß Gert, Müller Christiana

2019

Mit Daten Wert schaffen: Datengetriebene Geschäftsmodelle als Weg in die Zukunft

WINGbusiness Österreichischer Verband der Wirtschaftsingenieure Österreichischer Verband der Wirtschaftsingenieure Graz

Die Nutzung von Daten in Unternehmen zur Analyse und Beantwortung vielfältiger Fragestellungen ist “daily business”. Es steckt aber noch viel mehr Potenzial in Daten abseits von Prozessoptimierungen und Business Intelligence Anwendungen. Der vorliegende Beitrag gibt einen Überblick über die wichtigsten Aspekte bei der Transformation von Daten in Wert bzw. bei der Entwicklung datengetriebener Geschäftsmodelle. Dabei werden die Charakteristika von datengetriebenen Geschäftsmodellen und die benötigten Kompetenzen näher beleuchtet. Vier Fallbeispiele österreichischer Unternehmen geben Einblicke in die Praxis und abschließend werden aktuelle Herausforderungen und Entwicklungen diskutiert.

Clemens Bloechl, Rana Ali Amjad, Geiger Bernhard

2019

Co-Clustering via Information-Theoretic Markov Aggregation

IEEE Transactions on Knowledge and Data Engineering IEEE nicht angegeben

We present an information-theoretic cost function for co-clustering, i.e., for simultaneous clustering of two sets based on similarities between their elements. By constructing a simple random walk on the corresponding bipartite graph, our cost function is derived from a recently proposed generalized framework for information-theoretic Markov chain aggregation. The goal of our cost function is to minimize relevant information loss, hence it connects to the information bottleneck formalism. Moreover, via the connection to Markov aggregation, our cost function is not ad hoc, but inherits its justification from the operational qualities associated with the corresponding Markov aggregation problem. We furthermore show that, for appropriate parameter settings, our cost function is identical to well-known approaches from the literature, such as “Information-Theoretic Co-Clustering” by Dhillon et al. Hence, understanding the influence of this parameter admits a deeper understanding of the relationship between previously proposed information-theoretic cost functions. We highlight some strengths and weaknesses of the cost function for different parameters. We also illustrate the performance of our cost function, optimized with a simple sequential heuristic, on several synthetic and real-world data sets, including the Newsgroup20 and the MovieLens100k data sets.

Lovric Mario, Molero Perez Jose Manuel, Kern Roman

2019

PySpark and RDKit: moving towards Big Data in QSAR

Molecular Informatics Wiley

The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark‐RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low‐end workstations