Lovric Mario, Duricic Tomislav, Tran Thi Ngoc Han, Hussain Hussain, Lacic Emanuel, Morten A. Rasmussen, Kern Roman
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis—PCA, uniform manifold approximation and projection—UMAP, and variational autoencoders—VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Lacic Emanuel, Markus Reiter-Haas, Kowald Dominik, Reddy Dareddy Mano, Cho Junghoo, Lex Elisabeth
In this work, we address the problem of providing job recommendations in an online session setting, in which we do not have full user histories. We propose a recom-mendation approach, which uses different autoencoder architectures to encode ses-sions from the job domain. The inferred latent session representations are then used in a k-nearest neighbor manner to recommend jobs within a session. We evaluate our approach on three datasets, (1) a proprietary dataset we gathered from the Austrian student job portal Studo Jobs, (2) a dataset released by XING after the RecSys 2017 Challenge and (3) anonymized job applications released by CareerBuilder in 2012. Our results show that autoencoders provide relevant job recommendations as well as maintain a high coverage and, at the same time, can outperform state-of-the-art session-based recommendation techniques in terms of system-based and session-based novelty
Lacic Emanuel, Traub Matthias, Duricic Tomislav, Haslauer Eva, Lex Elisabeth
A challenge for importers in the automobile industry is adjusting to rapidly changing market demands. In this work, we describe a practical study of car import planning based on the monthly car registrations in Austria. We model the task as a data driven forecasting problem and we implement four different prediction approaches. One utilizes a seasonal ARIMA model, while the other is based on LSTM-RNN and both compared to a linear and seasonal baselines. In our experiments, we evaluate the 33 different brands by predicting the number of registrations for the next month and for the year to come.