Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen


Toller Maximilian, Santos Tiago, Kern Roman

SAZED: parameter-free domain-agnostic season length estimation in time series data

Data Mining and Knowledge Discovery, Springer US, 2019

Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of O( log ), which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.

Santos Tiago, Kern Roman

Understanding semiconductor production with variational auto-encoders

European Symposium on Artificial Neural Network (ESANN) 2018, 2018

Semiconductor manufacturing processes critically depend on hundreds of highly complex process steps, which may cause critical deviations in the end-product.Hence, a better understanding of wafer test data patterns, which represent stress tests conducted on devices in semiconductor material slices, may lead to an improved production process.However, the shapes and types of these wafer patterns, as well as their relation to single process steps, are unknown.In a first step to address these issues, we tailor and apply a variational auto-encoder (VAE) to wafer pattern images.We find the VAE's generator allows for explorative wafer pattern analysis, andits encoder provides an effective dimensionality reduction algorithm, which, in a clustering application, performs better than several baselines such as t-SNE and yields interpretable clusters of wafer patterns.

Santos Tiago, Walk Simon, Kern Roman, Strohmaier M., Helic Denis

Activity in Questions & Answers Websites

ACM Transactions on Social Computing, 2018

Millions of users on the Internet discuss a variety of topics on Question and Answer (Q&A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve self-sustaining levels of activity while others fail to attract users and either never grow beyond being a small niche community or become inactive. Hence, it is imperative to not only better understand but also to distill deciding factors and rules that define and govern sustainable Q&A instances. We aim to empower community managers with quantitative methods for them to better understand, control and foster their communities, and thus contribute to making the Web a more efficient place to exchange information. To that end, we extract, model and cluster user activity-based time series from 50 randomly selected Q&A instances from the StackExchange network to characterize user behavior. We find four distinct types of user activity temporal patterns, which vary primarily according to the users' activity frequency. Finally, by breaking down total activity in our 50 Q&A instances by the previously identified user activity profiles, we classify those 50 Q&A instances into three different activity profiles. Our categorization of Q&A instances aligns with the stage of development and maturity of the underlying communities, which can potentially help operators of such instances not only to quantitatively assess status and progress, but also allow them to optimize community building efforts

Santos Tiago, Walk Simon, Kern Roman, Helic Denis

Evolution of Collaborative Web Communities

ACM Hypertext 2018, 2018

Each day, millions of users visit collaborative Web communities, such as Wikipedia or StackExchange, either as large knowledge repositories or as up-to-date news sources.However, not all of Web communities are as successful as Wikipedia and, except for a few initial research results, our research community still knows only a little about what separates a successful from an unsuccessful community.Thus, we still need to (i) gain a better understanding of the underlying community evolution dynamics, and (ii) based on this understanding support activity and growth on such platforms.To that end, we distill temporal dynamics of community activity and thereby identify key factors leading to success or failure of communities.In particular, we study the differences between growing and declining communities by leveraging multivariate Hawkes processes. Furthermore, we compare communities hosted on different platforms such as StackExchange and Reddit, as well as topically diverse communities such as STEM and humanities.We find that all growing communities exhibit (i) an active core of power users reacting to the community as a whole, and (ii) numerous casual users strongly interacting with other casual users suggesting community openness towards less active users.Moreover, our results suggest that communities in the humanities are centered around power users, whereas in STEM communities activity is more evenly distributed among power and casual users.These results are of practical importance for community managers to quantitatively assess the status of their communities and guide them towards thriving community structures

Santos Tiago, Walk Simon, Helic Denis

Nonlinear Characterization of Activity Dynamics in Online Collaboration Websites

WWW '17 Companion Proceedings of the 26th International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, Perth, Australia, 2017

Modeling activity in online collaboration websites, such asStackExchange Question and Answering portals, is becom-ing increasingly important, as the success of these websitescritically depends on the content contributed by its users. Inthis paper, we represent user activity as time series and per-form an initial analysis of these time series to obtain a bet-ter understanding of the underlying mechanisms that governtheir creation. In particular, we are interested in identifyinglatent nonlinear behavior in online user activity as opposedto a simpler linear operating mode. To that end, we applya set of statistical tests for nonlinearity as a means to char-acterize activity time series derived from 16 different onlinecollaboration websites. We validate our approach by com-paring activity forecast performance from linear and nonlin-ear models, and study the underlying dynamical systems wederive with nonlinear time series analysis. Our results showthat nonlinear characterizations of activity time series helpto (i) improve our understanding of activity dynamics in on-line collaboration websites, and (ii) increase the accuracy offorecasting experiments.

Santos Tiago, Kern Roman

A Literature Survey of Early Time Series Classification and Deep Learning

SamI40 workshop at i-KNOW'16, 2016

This paper provides an overview of current literature on timeseries classification approaches, in particular of early timeseries classification.A very common and effective time series classification ap-proach is the 1-Nearest Neighbor classifier, with differentdistance measures such as the Euclidean or dynamic timewarping distances. This paper starts by reviewing thesebaseline methods.More recently, with the gain in popularity in the applica-tion of deep neural networks to the field of computer vision,research has focused on developing deep learning architec-tures for time series classification as well. The literature inthe field of deep learning for time series classification hasshown promising results.Early time series classification aims to classify a time se-ries with as few temporal observations as possible, whilekeeping the loss of classification accuracy at a minimum.Prominent early classification frameworks reviewed by thispaper include, but are not limited to, ECTS, RelClass andECDIRE. These works have shown that early time seriesclassification may be feasible and performant, but they alsoshow room for improvement
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.