Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen


Santos Tiago, Schrunner Stefan, Geiger Bernhard, Pfeiler Olivia, Zernig Anja, Kaestner Andre, Kern Roman

Feature Extraction From Analog Wafermaps: A Comparison of Classical Image Processing and a Deep Generative Model

IEEE Transactions on Semiconductor Manufacturing, IEEE, 2019

Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this paper, we go one step further and analyze test data of devices still within the limits of the specification, by exploiting the information contained in the analog wafermaps. To that end, we propose two feature extraction approaches with the aim to detect patterns in the wafer test dataset. Such patterns might indicate the onset of critical deviations in the production process. The studied approaches are: 1) classical image processing and restoration techniques in combination with sophisticated feature engineering and 2) a data-driven deep generative model. The two approaches are evaluated on both a synthetic and a real-world dataset. The synthetic dataset has been modeled based on real-world patterns and characteristics. We found both approaches to provide similar overall evaluation metrics. Our in-depth analysis helps to choose one approach over the other depending on data availability as a major aspect, as well as on available computing power and required interpretability of the results

Santos Tiago, Walk Simon, Kern Roman, Strohmaier Markus, Helic Denis

Activity Archetypes in Question-and-Answer (Q8A) Websites—A Study of 50 Stack Exchange Instances

ACM Transactions on Social Computing, 2019

Millions of users on the Internet discuss a variety of topics on Question-and-Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve selfsustaining levels of activity, while others fail to attract users and either never grow beyond being a smallniche community or become inactive. Hence, it is imperative to not only better understand but also to distilldeciding factors and rules that define and govern sustainable Q&A instances. We aim to empower communitymanagers with quantitative methods for them to better understand, control, and foster their communities,and thus contribute to making the Web a more efficient place to exchange information. To that end, we extract, model, and cluster a user activity-based time series from 50 randomly selected Q&A instances from theStack Exchange network to characterize user behavior. We find four distinct types of user activity temporalpatterns, which vary primarily according to the users’ activity frequency. Finally, by breaking down totalactivity in our 50 Q&A instances by the previously identified user activity profiles, we classify those 50 Q&Ainstances into three different activity profiles. Our parsimonious categorization of Q&A instances aligns withthe stage of development and maturity of the underlying communities, and can potentially help operatorsof such instances: We not only quantitatively assess progress of Q&A instances, but we also derive practicalimplications for optimizing Q&A community building efforts, as we, e.g., recommend which user types tofocus on at different developmental stages of a Q&A community

Toller Maximilian, Santos Tiago, Kern Roman

SAZED: parameter-free domain-agnostic season length estimation in time series data

Data Mining and Knowledge Discovery, Springer US, 2019

Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of O( log ), which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.

Lovric Mario, Molero Perez Jose Manuel, Kern Roman

PySpark and RDKit: moving towards Big Data in QSAR

Molecular Informatics, Wiley, 2019

The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark‐RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low‐end workstations
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.