Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2019

Lovric Mario, Molero Perez Jose Manuel, Kern Roman

PySpark and RDKit: moving towards Big Data in QSAR

Molecular Informatics, Wiley, 2019

Journal
The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark‐RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low‐end workstations
2019

Jorge Guerra Torres, Carlos Catania, Veas Eduardo Enrique

Active learning approach to label network traffic datasets

Journal of Information Security and Applications, Elsevier, Elsevier, 2019

Journal
Modern Network Intrusion Detection systems depend on models trained with up-to-date labeled data. Yet, the process of labeling a network traffic dataset is specially expensive, since expert knowledge is required to perform the annotations. Visual analytics applications exist that claim to considerably reduce the labeling effort, but the expert still needs to ponder several factors before issuing a label. And, most often the effect of bad labels (noise) in the final model is not evaluated. The present article introduces a novel active learning strategy that learns to predict labels in (pseudo) real-time as the user performs the annotation. The system called RiskID, presents several innovations: i) a set of statistical methods summarize the information, which is illustrated in a visual analytics application, ii) that interfaces with the active learning strategy forbuilding a random forest model as the user issues annotations; iii) the (pseudo) real-time predictions of the model are fed back visually to scaffold the traffic annotation task. Finally, iv) an evaluation framework is introduced that represents a complete methodology for evaluating active learning solutions, including resilience against noise.
2019

Santos Tiago, Schrunner Stefan, Geiger Bernhard, Pfeiler Olivia, Zernig Anja, Kaestner Andre, Kern Roman

Feature Extraction From Analog Wafermaps: A Comparison of Classical Image Processing and a Deep Generative Mode

IEEE Transactions on Semiconductor Manufacturing, IEEE, 2019

Journal
Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this paper, we go one step further and analyze test data of devices still within the limits of the specification, by exploiting the information contained in the analog wafermaps. To that end, we propose two feature extraction approaches with the aim to detect patterns in the wafer test dataset. Such patterns might indicate the onset of critical deviations in the production process. The studied approaches are: 1) classical image processing and restoration techniques in combination with sophisticated feature engineering and 2) a data-driven deep generative model. The two approaches are evaluated on both a synthetic and a real-world dataset. The synthetic dataset has been modeled based on real-world patterns and characteristics. We found both approaches to provide similar overall evaluation metrics. Our in-depth analysis helps to choose one approach over the other depending on data availability as a major aspect, as well as on available computing power and required interpretability of the results
2019

Lex Elisabeth, Kowald Dominik, Schedl Markus

Modeling Popularity and Temporal Drift of Music Genre Preference

Transactions of the International Society for Music Information Retrieval (TISMIR, 2019

Journal
2019

Santos Tiago, Walk Simon, Kern Roman, Strohmaier Markus, Helic Denis

Activity Archetypes in Question-and-Answer (Q8A) Websites—A Study of 50 Stack Exchange Instances

ACM Transactions on Social Computing, 2019

Journal
Millions of users on the Internet discuss a variety of topics on Question-and-Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve selfsustaining levels of activity, while others fail to attract users and either never grow beyond being a smallniche community or become inactive. Hence, it is imperative to not only better understand but also to distilldeciding factors and rules that define and govern sustainable Q&A instances. We aim to empower communitymanagers with quantitative methods for them to better understand, control, and foster their communities,and thus contribute to making the Web a more efficient place to exchange information. To that end, we extract, model, and cluster a user activity-based time series from 50 randomly selected Q&A instances from theStack Exchange network to characterize user behavior. We find four distinct types of user activity temporalpatterns, which vary primarily according to the users’ activity frequency. Finally, by breaking down totalactivity in our 50 Q&A instances by the previously identified user activity profiles, we classify those 50 Q&Ainstances into three different activity profiles. Our parsimonious categorization of Q&A instances aligns withthe stage of development and maturity of the underlying communities, and can potentially help operatorsof such instances: We not only quantitatively assess progress of Q&A instances, but we also derive practicalimplications for optimizing Q&A community building efforts, as we, e.g., recommend which user types tofocus on at different developmental stages of a Q&A community
2019

Silva Nelson, Madureira, Luis

SUPPLY CHAIN INTELLIGENCE E A ANÁLISE ESTRATÉGICA DE RELAÇÕES COMPLEXAS

Supply Chain Magazine, Luis Filipe, 2019

Journal
Uncover hidden suppliers and their complex relationships across the entire Supply Chain is quite complex. Unexpected disruptions, e.g. earthquakes, volcanoes, bankruptcies or nuclear disasters have a huge impact on major Supply Chain strategies. It is very difficult to predict the real impact of these disruptions until it is too late. Small, unknown suppliers can hugely impact the delivery of a product. Therefore, it is crucial to constantly monitor for problems with both direct and indirect suppliers.
2019

di Sciascio Maria Cecilia, Strohmaier David, Errecalde Marcelo Luis, Veas Eduardo Enrique

Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia

ACM, 2019

Journal
Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores.
2019

di Sciascio Maria Cecilia, Brusilovsky Peter, Trattner Christoph, Veas Eduardo Enrique

A Roadmap to User-Controllable Social Exploratory Search

ACM, 2019

Journal
Information-seeking tasks with learning or investigative purposes are usually referred to as exploratory search. Exploratory search unfolds as a dynamic process where the user, amidst navigation, trial and error, and on-the-fly selections, gathers and organizes information (resources). A range of innovative interfaces with increased user control has been developed to support the exploratory search process. In this work, we present our attempt to increase the power of exploratory search interfaces by using ideas of social search—for instance, leveraging information left by past users of information systems. Social search technologies are highly popular today, especially for improving ranking. However, current approaches to social ranking do not allow users to decide to what extent social information should be taken into account for result ranking. This article presents an interface that integrates social search functionality into an exploratory search system in a user-controlled way that is consistent with the nature of exploratory search. The interface incorporates control features that allow the user to (i) express information needs by selecting keywords and (ii) to express preferences for incorporating social wisdom based on tag matching and user similarity. The interface promotes search transparency through color-coded stacked bars and rich tooltips. This work presents the full series of evaluations conducted to, first, assess the value of the social models in contexts independent to the user interface, in terms of objective and perceived accuracy. Then, in a study with the full-fledged system, we investigated system accuracy and subjective aspects with a structural model revealing that when users actively interacted with all of its control features, the hybrid system outperformed a baseline content-based–only tool and users were more satisfied.
2019

Geiger Bernhard, Koch Tobias

On the Information Dimension of Stochastic Processes

IEEE Transactions on Information Theory, IEEE, 2019

Journal
2019

Barreiros Carla, Pammer-Schindler Viktoria, Veas Eduardo Enrique

Planting the Seed of Positive Human-IoT Interaction

International Journal of Human–Computer Interaction, Taylor and Francis, 2019

Journal
We present a visual interface for communicating the internal state of a coffee machine via a tree metaphor. Nature-inspired representations have a positive impact on human well-being. We also hypothesize that representing the coffee machine asa tree stimulates emotional connection to it, which leads to better maintenance performance.The first study assessed the understandability of the tree representation, comparing it with icon-based and chart-based representations. An online survey with 25 participants indicated no significant mean error difference between representations.A two-week field study assessed the maintenance performance of 12 participants, comparing the tree representation with the icon-based representation. Based on 240 interactions with the coffee machine, we concluded that participants understood themachine states significantly better in the tree representation. Their comments and behavior indicated that the tree representation encouraged an emotional engagement with the machine. Moreover, the participants performed significantly more optional maintenance tasks with the tree representation.
2019

Clemens Bloechl, Rana Ali Amjad, Geiger Bernhard

Co-Clustering via Information-Theoretic Markov Aggregation

IEEE Transactions on Knowledge and Data Engineering, IEEE, 2019

Journal
We present an information-theoretic cost function for co-clustering, i.e., for simultaneous clustering of two sets based on similarities between their elements. By constructing a simple random walk on the corresponding bipartite graph, our cost function is derived from a recently proposed generalized framework for information-theoretic Markov chain aggregation. The goal of our cost function is to minimize relevant information loss, hence it connects to the information bottleneck formalism. Moreover, via the connection to Markov aggregation, our cost function is not ad hoc, but inherits its justification from the operational qualities associated with the corresponding Markov aggregation problem. We furthermore show that, for appropriate parameter settings, our cost function is identical to well-known approaches from the literature, such as “Information-Theoretic Co-Clustering” by Dhillon et al. Hence, understanding the influence of this parameter admits a deeper understanding of the relationship between previously proposed information-theoretic cost functions. We highlight some strengths and weaknesses of the cost function for different parameters. We also illustrate the performance of our cost function, optimized with a simple sequential heuristic, on several synthetic and real-world data sets, including the Newsgroup20 and the MovieLens100k data sets
2019

Toller Maximilian, Santos Tiago, Kern Roman

SAZED: parameter-free domain-agnostic season length estimation in time series data

Data Mining and Knowledge Discovery, Springer US, 2019

Journal
Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of O( log ), which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.
2019

Egger Peter, Dominik Geringer, Gerwald Gindra-Vady, Christina Gruber, Elisabeth Paar, Lukas Reiter, Karl Stöger, Stefan Thalmann

Challenges of a Digital Single Market from an Austrianperspective – towards Smart Regulations

ALJ - Austrian Law Journal, 2019

Journal
This paper discusses various legal challenges of the “digitisation of the singlemarket”. The question arises to which extent the current regulatory framework appearssuitable to deal with the presented challenges of digitisation and where additionalregulation is required. In the field of autonomous decision-making by AI, we identified themost pressing need for new regulation. While the EU (and increasingly Austria, as well) isaware of this need, regulation to date remains scarce. Though the EU legislator has alreadytaken specific precautions for the use of algorithms in the GDPR, such regulatoryapproaches are missing in most other fields of law. In contrast to this, antitrust law andproduct liability law already appear to be well suited to meet the challenges posed bydigitisation. This is especially true for product liability law, which is in principle apt to coverthe specific challenges of the convergence of software and hardware in smart products.However, uncertainty about its applicability to incorporeal goods would make clarificationof current product liability legislation advisable – a view shared by the EuropeanCommission. Two more fields very recently received some legislative attention due to thechanging needs of a digital society: the postal sector on the one hand, and e-governmenton the other hand. In both fields, new legislation – tellingly in the form of (partially) directlyapplicable regulations – has recently been passed by the EU – a sharp contrast to the caseof self-learning AI. However, while the integration of the new regulation on cross-borderparcel delivery will probably not pose major challenges for domestic markets, theimplementation of the Single Digital Gateway will raise serious organisational and legalchallenges for national administrations (especially when taking into account the limitedsuccess of the previous related initiative on the points of single contact under the ServicesDirective).
2019

Stepputat Kendra, Kienreich Wolfgang, Dick Christopher S.

Digital Methods in Intangible Cultural Heritage Research: A Case Study in Tango Argentino

Journal on Computing and Cultural Heritage (JOCCH), ACM, ACM, New York, NY, USA, 2019

Journal
With this article, we present the ongoing research project “Tango Danceability of Music in European Perspective” and the transdisciplinary research design it is built upon. Three main aspects of tango argentino are in focus—the music, the dance, and the people—in order to understand what is considered danceable in tango music. The study of all three parts involves computer-aided analysis approaches, and the results are examined within ethnochoreological and ethnomusicological frameworks. Two approaches are illustrated in detail to show initial results of the research model. Network analysis based on the collection of online tango event data and quantitative evaluation of data gathered by an online survey showed significant results, corroborating the hypothesis of gatekeeping effects in the shaping of musical preferences. The experiment design includes incorporation of motion capture technology into dance research. We demonstrate certain advantages of transdisciplinary approaches in the study of Intangible Cultural Heritage, in contrast to conventional studies based on methods from just one academic discipline.
2019

Tiago Santos, Stefan Schrunner, Geiger Bernhard, Olivia Pfeiler, Anja Zernig, Andre Kaestner, Kern Roman

Feature Extraction From Analog Wafermaps: A Comparison of Classical Image Processing and a Deep Generative Mode

IEEE Transactions on Semiconductor Manufacturing, IEEE, 2019

Journal
Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this paper, we go one step further and analyze test data of devices still within the limits of the specification, by exploiting the information contained in the analog wafermaps. To that end, we propose two feature extraction approaches with the aim to detect patterns in the wafer test dataset. Such patterns might indicate the onset of critical deviations in the production process. The studied approaches are: 1) classical image processing and restoration techniques in combination with sophisticated feature engineering and 2) a data-driven deep generative model. The two approaches are evaluated on both a synthetic and a real-world dataset. The synthetic dataset has been modeled based on real-world patterns and characteristics. We found both approaches to provide similar overall evaluation metrics. Our in-depth analysis helps to choose one approach over the other depending on data availability as a major aspect, as well as on available computing power and required interpretability of the results
2019

Adolfo Ruiz Calleja, Dennerlein Sebastian, Kowald Dominik, Theiler Dieter, Lex Elisabeth, Tobias Ley

An Infrastructure for Workplace Learning Analytics: Tracing Knowledge Creation with the Social Semantic Server

Journal of Learning Analytics, Society for Learning Analytics Research (SoLAR), UTS ePress , 2019

Journal
In this paper, we propose the Social Semantic Server (SSS) as a service-based infrastructure for workplace andprofessional Learning Analytics (LA). The design and development of the SSS has evolved over 8 years, startingwith an analysis of workplace learning inspired by knowledge creation theories and its application in differentcontexts. The SSS collects data from workplace learning tools, integrates it into a common data model based ona semantically-enriched Artifact-Actor Network and offers it back for LA applications to exploit the data. Further,the SSS design promotes its flexibility in order to be adapted to different workplace learning situations. Thispaper contributes by systematizing the derivation of requirements for the SSS according to the knowledge creationtheories, and the support offered across a number of different learning tools and LA applications integrated to it.It also shows evidence for the usefulness of the SSS extracted from four authentic workplace learning situationsinvolving 57 participants. The evaluation results indicate that the SSS satisfactorily supports decision making indiverse workplace learning situations and allow us to reflect on the importance of the knowledge creation theoriesfor such analysis.
2019

Renner Bettina, Wesiak Gudrun, Pammer-Schindler Viktoria, Prilla Michael, Müller Lars, Morosini Dalia, Mora Simone, Faltin Nils, Cress Ulrike

Computer-supported reflective learning: How apps can foster reflection at work.

Behaviour & Information Technology, Taylor & Francis, Taylor & Francis, 2019

Journal
2019

Fruhwirth Michael, Breitfuß Gert, Müller Christiana

Mit Daten Wert schaffen: Datengetriebene Geschäftsmodelle als Weg in die Zukunft

Österreichischer Verband der Wirtschaftsingenieure, Österreichischer Verband der Wirtschaftsingenieure, Graz, 2019

Journal
Die Nutzung von Daten in Unternehmen zur Analyse und Beantwortung vielfältiger Fragestellungen ist “daily business”. Es steckt aber noch viel mehr Potenzial in Daten abseits von Prozessoptimierungen und Business Intelligence Anwendungen. Der vorliegende Beitrag gibt einen Überblick über die wichtigsten Aspekte bei der Transformation von Daten in Wert bzw. bei der Entwicklung datengetriebener Geschäftsmodelle. Dabei werden die Charakteristika von datengetriebenen Geschäftsmodellen und die benötigten Kompetenzen näher beleuchtet. Vier Fallbeispiele österreichischer Unternehmen geben Einblicke in die Praxis und abschließend werden aktuelle Herausforderungen und Entwicklungen diskutiert.
2019

Stanisavljevic Darko, Cemernek David, Gursch Heimo, Urak Günter, Lechner Gernot

Detection of Interferences in an Additive Manufacturing Process: An Experimental Study Integrating Methods of Feature Selection and Machine Learning

International Journal of Production Research, Taylor & Francis, 2019

Journal
Additive manufacturing becomes a more and more important technology for production, mainly driven by the ability to realise extremely complex structures using multiple materials but without assembly or excessive waste. Nevertheless, like any high-precision technology additive manufacturing responds to interferences during the manufacturing process. These interferences – like vibrations – might lead to deviations in product quality, becoming manifest for instance in a reduced lifetime of a product or application issues. This study targets the issue of detecting such interferences during a manufacturing process in an exemplary experimental setup. Collection of data using current sensor technology directly on a 3D-printer enables a quantitative detection of interferences. The evaluation provides insights into the effectiveness of the realised application-oriented setup, the effort required for equipping a manufacturing system with sensors, and the effort for acquisition and processing the data. These insights are of practical utility for organisations dealing with additive manufacturing: the chosen approach for detecting interferences shows promising results, reaching interference detection rates of up to 100% depending on the applied data processing configuration.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close