Lacic Emanuel, Duricic Tomislav, Fadljevic Leon, Theiler Dieter, Kowald Dominik
Uptrendz: API-Centric Real-Time Recommendations in Multi-Domain Settings
Lacic Emanuel, Kowald Dominik
In this industry talk at ECIR'2022, we illustrate how to build a modern recommender system that can serve recommendations in real-time for a diverse set of application domains. Specifically, we present our system architecture that utilizes popular recommendation algorithms from the literature such as Collaborative Filtering, Content-based Filtering as well as various neural embedding approaches (e.g., Doc2Vec, Autoencoders, etc.). We showcase the applicability of our system architecture using two real-world use-cases, namely providing recommendations for the domains of (i) job marketplaces, and (ii) entrepreneurial start-up founding. We strongly believe that our experiences from both research- and industry-oriented settings should be of interest for practitioners in the field of real-time multi-domain recommender systems.
Lacic Emanuel, Fadljevic Leon, Weissenböck Franz, Lindstaedt Stefanie , Kowald Dominik
Personalized news recommender systems support readers in finding the right and relevant articles in online news platforms. In this paper, we discuss the introduction of personalized, content-based news recommendations on DiePresse, a popular Austrian online news platform, focusing on two specific aspects: (i) user interface type, and (ii) popularity bias mitigation. Therefore, we conducted a two-weeks online study that started in October 2020, in which we analyzed the impact of recommendations on two user groups, i.e., anonymous and subscribed users, and three user interface types, i.e., on a desktop, mobile and tablet device. With respect to user interface types, we find that the probability of a recommendation to be seen is the highest for desktop devices, while the probability of interacting with recommendations is the highest for mobile devices. With respect to popularity bias mitigation, we find that personalized, content-based news recommendations can lead to a more balanced distribution of news articles' readership popularity in the case of anonymous users. Apart from that, we find that significant events (e.g., the COVID-19 lockdown announcement in Austria and the Vienna terror attack) influence the general consumption behavior of popular articles for both, anonymous and subscribed users
Kowald Dominik, Lacic Emanuel
Multimedia recommender systems suggest media items, e.g., songs, (digital) books and movies, to users by utilizing concepts of traditional recommender systems such as collaborative filtering. In this paper, we investigate a potential issue of such collaborative-filtering based multimedia recommender systems, namely popularity bias that leads to the underrepresentation of unpopular items in the recommendation lists. Therefore, we study four multimedia datasets, i.e., LastFm, MovieLens, BookCrossing and MyAnimeList, that we each split into three user groups differing in their inclination to popularity, i.e., LowPop, MedPop and HighPop. Using these user groups, we evaluate four collaborative filtering-based algorithms with respect to popularity bias on the item and the user level. Our findings are three-fold: firstly, we show that users with little interest into popular items tend to have large user profiles and thus, are important data sources for multimedia recommender systems. Secondly, we find that popular items are recommended more frequently than unpopular ones. Thirdly, we find that users with little interest into popular items receive significantly worse recommendations than users with medium or high interest into popularity.
Lovric Mario, Duricic Tomislav, Tran Thi Ngoc Han, Hussain Hussain, Lacic Emanuel, Morten A. Rasmussen, Kern Roman
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis—PCA, uniform manifold approximation and projection—UMAP, and variational autoencoders—VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Duricic Tomislav, Hussain Hussain, Lacic Emanuel, Kowald Dominik, Lex Elisabeth, Helic Denis
In this work, we study the utility of graph embeddings to generate latent user representations for trust-based collaborative filtering. In a cold-start setting, on three publicly available datasets, we evaluate approaches from four method families:(i) factorization-based,(ii) random walk-based,(iii) deep learning-based, and (iv) the Large-scale Information Network Embedding (LINE) approach. We find that across the four families, random-walk-based approaches consistently achieve the best accuracy. Besides, they result in highly novel and diverse recommendations. Furthermore, our results show that the use of graph embeddings in trust-based collaborative filtering significantly improves user coverage
Reiter-Haas Markus, Wittenbrink Davi, Lacic Emanuel
Finding the right job is a difficult task for anyone as it usually depends on many factors like salary, job description, or geographical location. Students with almost no prior experience, especially, have a hard time on the job market, which is very competitive in nature. Additionally, students often suffer a lack of orientation, as they do not know what kind of job is suitable for their education. At Talto1, we realized this and have built a platform to help Austrian university students with finding their career paths as well as providing them with content that is relevant to their career possibilities. This is mainly achieved by guiding the students toward different types of entities that are related to their career, i.e., job postings, company profiles, and career-related articles.In this talk, we share our experiences with solving the recommendation problem for university students. One trait of the student-focused job domain is that behaviour of the students differs depending on their study progression. At the beginning of their studies, they need study-specific career information and part-time jobs to earn additional money. Whereas, when they are nearing graduation, they require information about their potential future employers and entry-level full-time jobs. Moreover, we can observe seasonal patterns in user activity in addition to the need of handling both logged-in and anonymous session users at the same time.To cope with the requirements of the job domain, we built hybrid models based on a microservice architecture that utilizes popular algorithms from the literature such as Collaborative Filtering, Content-based Filtering as well as various neural embedding approaches (e.g., Doc2Vec, Autoencoders, etc.). We further adapted our architecture to calculate relevant recommendations in real-time (i.e., after a recommendation is requested) as individual user sessions in Talto are usually short-lived and context-dependent. Here we found that the online performance of the utilized approach also depends on the location context [1]. Hence, the current location of a user on the mobile or web application impacts the expected recommendations.One optimization criterion on the Talto career platform is to provide relevant cross-entity recommendations as well as explain why those were shown. Recently, we started to tackle this by learning embeddings of entities that lie in the same embedding space [2]. Specifically, we pre-train word embeddings and link different entities by shared concepts, which we use for training the network embeddings. This embeds both the concepts and the entities into a common vector space, where the common vector space is a result of considering the textual content, as well as the network information (i.e., links to concepts). This way, different entity types (e.g., job postings, company profiles, and articles) are directly comparable and are suited for a real-time recommendation setting. Interestingly enough, with such an approach we also end up with individual words sharing the same embedding space. This, in turn, can be leveraged to enhance the textual search functionality of a platform, which is most commonly based just on a TF-IDF model.Furthermore, we found that such embeddings allow us to tackle the problem of explainability in an algorithm-agnostic way. Since the Talto platform utilizes various recommendation algorithms as well as continuously conducts AB tests, an algorithm-agnostic explainability model would be best suited to provide the students with meaningful explanations. As such, we will also go into the details on how we can adapt our explanation model to not rely on the utilized recommendation algorithm.
Lacic Emanuel, Markus Reiter-Haas, Kowald Dominik, Reddy Dareddy Mano, Cho Junghoo, Lex Elisabeth
In this work, we address the problem of providing job recommendations in an online session setting, in which we do not have full user histories. We propose a recom-mendation approach, which uses different autoencoder architectures to encode ses-sions from the job domain. The inferred latent session representations are then used in a k-nearest neighbor manner to recommend jobs within a session. We evaluate our approach on three datasets, (1) a proprietary dataset we gathered from the Austrian student job portal Studo Jobs, (2) a dataset released by XING after the RecSys 2017 Challenge and (3) anonymized job applications released by CareerBuilder in 2012. Our results show that autoencoders provide relevant job recommendations as well as maintain a high coverage and, at the same time, can outperform state-of-the-art session-based recommendation techniques in terms of system-based and session-based novelty
Lacic Emanuel, Reiter-Haas Markus, Duricic Tomislav, Slawicek Valentin, Lex Elisabeth
In this work, we present the findings of an online study, where we explore the impact of utilizing embeddings to recommend job postings under real-time constraints. On the Austrian job platform Studo Jobs, we evaluate two popular recommendation scenarios: (i) providing similar jobs and, (ii) personalizing the job postings that are shown on the homepage. Our results show that for recommending similar jobs, we achieve the best online performance in terms of Click-Through Rate when we employ embeddings based on the most recent interaction. To personalize the job postings shown on a user's homepage, however, combining embeddings based on the frequency and recency with which a user interacts with job postings results in the best online performance.
Duricic Tomislav, Lacic Emanuel, Kowald Dominik, Lex Elisabeth
User-based Collaborative Filtering (CF) is one of the most popular approaches to create recommender systems. CF, however, suffers from data sparsity and the cold-start problem since users often rate only a small fraction of available items. One solution is to incorporate additional information into the recommendation process such as explicit trust scores that are assigned by users to others or implicit trust relationships that result from social connections between users. Such relationships typically form a very sparse trust network, which can be utilized to generate recommendations for users based on people they trust. In our work, we explore the use of regular equivalence applied to a trust network to generate a similarity matrix that is used for selecting k-nearest neighbors used for item recommendation. Two vertices in a network are regularly equivalent if their neighbors are themselves equivalent and by using the iterative approach of calculating regular equivalence, we can study the impact of strong and weak ties on item recommendation. We evaluate our approach on cold start users on a dataset crawled from Epinions and find that by using weak ties in addition to strong ties, we can improve the performance of a trust-based recommender in terms of recommendation accuracy.
Kowald Dominik, Traub Matthias, Theiler Dieter, Gursch Heimo, Lacic Emanuel, Lindstaedt Stefanie , Kern Roman, Lex Elisabeth
Kowald Dominik, Lacic Emanuel, Theiler Dieter, Traub Matthias, Kuffer Lucky, Lindstaedt Stefanie , Lex Elisabeth
Lacic Emanuel, Kowald Dominik, Lex Elisabeth
In this paper, we present work-in-progress on applying user pre-filtering to speed up and enhance recommendations based on Collab-orative Filtering. We propose to pre-filter users in order to extracta smaller set of candidate neighbors, who exhibit a high numberof overlapping entities and to compute the final user similaritiesbased on this set. To realize this, we exploit features of the high-performance search engine Apache Solr and integrate them into ascalable recommender system. We have evaluated our approachon a dataset gathered from Foursquare and our evaluation resultssuggest that our proposed user pre-filtering step can help to achieveboth a better runtime performance as well as an increase in overallrecommendation accuracy
Kowald Dominik, Lacic Emanuel, Theiler Dieter, Lex Elisabeth
In this paper, we present preliminary results of AFEL-REC, a rec-ommender system for social learning environments. AFEL-RECis build upon a scalable so‰ware architecture to provide recom-mendations of learning resources in near real-time. Furthermore,AFEL-REC can cope with any kind of data that is present in sociallearning environments such as resource metadata, user interactionsor social tags. We provide a preliminary evaluation of three rec-ommendation use cases implemented in AFEL-REC and we €ndthat utilizing social data in form of tags is helpful for not only im-proving recommendation accuracy but also coverage. ‘is papershould be valuable for both researchers and practitioners inter-ested in providing resource recommendations in social learningenvironments
Duricic Tomislav, Lacic Emanuel, Kowald Dominik, Lex Elisabeth
User-based Collaborative Filtering (CF) is one of the most popularapproaches to create recommender systems. Œis approach is basedon €nding the most relevant k users from whose rating history wecan extract items to recommend. CF, however, su‚ers from datasparsity and the cold-start problem since users o‰en rate only asmall fraction of available items. One solution is to incorporateadditional information into the recommendation process such asexplicit trust scores that are assigned by users to others or implicittrust relationships that result from social connections betweenusers. Such relationships typically form a very sparse trust network,which can be utilized to generate recommendations for users basedon people they trust. In our work, we explore the use of a measurefrom network science, i.e. regular equivalence, applied to a trustnetwork to generate a similarity matrix that is used to select thek-nearest neighbors for recommending items. We evaluate ourapproach on Epinions and we €nd that we can outperform relatedmethods for tackling cold-start users in terms of recommendationaccuracy
Lacic Emanuel, Traub Matthias, Duricic Tomislav, Haslauer Eva, Lex Elisabeth
A challenge for importers in the automobile industry is adjusting to rapidly changing market demands. In this work, we describe a practical study of car import planning based on the monthly car registrations in Austria. We model the task as a data driven forecasting problem and we implement four different prediction approaches. One utilizes a seasonal ARIMA model, while the other is based on LSTM-RNN and both compared to a linear and seasonal baselines. In our experiments, we evaluate the 33 different brands by predicting the number of registrations for the next month and for the year to come.
Lacic Emanuel, Kowald Dominik, Reiter-Haas Markus, Slawicek Valentin, Lex Elisabeth
In this work, we address the problem of recommending jobs touniversity students. For this, we explore the impact of using itemembeddings for a content-based job recommendation system. Fur-thermore, we utilize a model from human memory theory to integratethe factors of frequency and recency of job posting interactions forcombining item embeddings. We evaluate our job recommendationsystem on a dataset of the Austrian student job portal Studo usingprediction accuracy, diversity as well as adapted novelty, which isintroduced in this work. We find that utilizing frequency and recencyof interactions with job postings for combining item embeddingsresults in a robust model with respect to accuracy and diversity, butalso provides the best adapted novelty results
Reiter-Haas Markus, Slawicek Valentin, Lacic Emanuel
Lacic Emanuel, Kowald Dominik, Lex Elisabeth
Recommender systems are acknowledged as an essential instrumentto support users in finding relevant information. However,the adaptation of recommender systems to multiple domain-specificrequirements and data models still remains an open challenge. Inthe present paper, we contribute to this sparse line of research withguidance on how to design a customizable recommender systemthat accounts for multiple domains with heterogeneous data. Usingconcrete showcase examples, we demonstrate how to setup amulti-domain system on the item and system level, and we reportevaluation results for the domains of (i) LastFM, (ii) FourSquare,and (iii) MovieLens. We believe that our findings and guidelinescan support developers and researchers of recommender systemsto easily adapt and deploy a recommender system in distributedenvironments, as well as to develop and evaluate algorithms suitedfor multi-domain settings
Traub Matthias, Lacic Emanuel, Kowald Dominik, Kahr Martin, Lex Elisabeth
In this paper, we present work-in-progress on a recommender system designed to help people in need find the best suited social care institution for their personal issues. A key requirement in such a domain is to assure and to guarantee the person's privacy and anonymity in order to reduce inhibitions and to establish trust. We present how we aim to tackle this barely studied domain using a hybrid content-based recommendation approach. Our approach leverages three data sources containing textual content, namely (i) metadata from social care institutions, (ii) institution specific FAQs, and (iii) questions that a specific institution has already resolved. Additionally, our approach considers the time context of user questions as well as negative user feedback to previously provided recommendations. Finally, we demonstrate an application scenario of our recommender system in the form of a real-world Web system deployed in Austria.
Lacic Emanuel
Recommender systems are acknowledged as an essential instru- ment to support users in finding relevant information. However, adapting to different domain specific data models is a challenge, which many recommender frameworks neglect. Moreover, the ad- vent of the big data era has posed the need for high scalability and real-time processing of frequent data updates, and thus, has brought new challenges for the recommender systems’ research community. In this work, we show how different item, social and location data features can be utilized and supported to provide real-time recom- mendations. We further show how to process data updates online and capture user’s real-time interest without recalculating recom- mendations. The presented recommendation framework provides a scalable and customizable architecture suited for providing real- time recommendations to multiple domains. We further investigate the impact of an increasing request load and show how the runtime can be decreased by scaling the framework.
Lacic Emanuel, Kowald Dominik, Lex Elisabeth
Air travel is one of the most frequently used means of transportation in our every-day life. Thus, it is not surprising that an increasing number of travelers share their experiences with airlines and airports in form of online reviews on the Web. In this work, we thrive to explain and uncover the features of airline reviews that contribute most to traveler satisfaction. To that end, we examine reviews crawled from the Skytrax air travel review portal. Skytrax provides four review categories to review airports, lounges, airlines and seats. Each review category consists of several five-star ratings as well as free-text review content. In this paper, we conducted a comprehensive feature study and we find that not only five-star rating information such as airport queuing time and lounge comfort highly correlate with traveler satisfaction but also textual features in the form of the inferred review text sentiment. Based on our findings, we created classifiers to predict traveler satisfaction using the best performing rating features. Our results reveal that given our methodology, traveler satisfaction can be predicted with high accuracy. Additionally, we find that training a model on the sentiment of the review text provides a competitive alternative when no five star rating information is available. We believe that our work is of interest for researchers in the area of modeling and predicting user satisfaction based on available review data on the Web.
Traub Matthias, Kowald Dominik, Lacic Emanuel, Lex Elisabeth, Schoen Pepjin, Supp Gernot
In this paper, we present a scalable hotel recommender system for TripRebel, a new online booking portal. On the basis of the open-source enterprise search platform Apache Solr, we developed a system architecture with Web-based services to interact with indexed data at large scale as well as to provide hotel recommendations using various state-of-the-art recommender algorithms. We demonstrate the efficiency of our system directly using the live TripRebel portal where, in its current state, hotel alternatives for a given hotel are calculated based on data gathered from the Expedia AffiliateNetwork (EAN).
Dennerlein Sebastian, Kowald Dominik, Lex Elisabeth, Lacic Emanuel, Theiler Dieter, Ley Tobias
Informal learning at the workplace includes a multitude of processes. Respective activities can be categorized into multiple perspectives on informal learning, such as reflection, sensemaking, help seeking and maturing of collective knowledge. Each perspective raises requirements with respect to the technical support, this is why an integrated solution relying on social, adaptive and semantic technologies is needed. In this paper, we present the Social Semantic Server, an extensible, open-source application server that equips clientside tools with services to support and scale informal learning at the workplace. More specifically, the Social Semantic Server semantically enriches social data that is created at the workplace in the context of user-to-user or user-artifact interactions. This enriched data can then in turn be exploited in informal learning scenarios to, e.g., foster help seeking by recommending collaborators, resources, or experts. Following the design-based research paradigm, the Social Semantic Server has been implemented based on design principles, which were derived from theories such as Distributed Cognition and Meaning Making. We illustrate the applicability and efficacy of the Social Semantic Server in the light of three real-world applications that have been developed using its social semantic services. Furthermore, we report preliminary results of two user studies that have been carried out recently.
Lacic Emanuel, Traub Matthias, Kowald Dominik, Lex Elisabeth
In this paper, we present our approach towards an effective scalable recommender framework termed ScaR. Our framework is based on the microservices architecture and exploits search technology to provide real-time recommendations. Since it is our aim to create a system that can be used in a broad range of scenarios, we designed it to be capable of handling various data streams and sources. We show its efficacy and scalability with an initial experiment on how the framework can be used in a large-scale setting.
Lacic Emanuel, Luzhnica Granit, Simon Jörg Peter, Traub Matthias, Lex Elisabeth, Kowald Dominik
In this paper, we present work-in-progress on a recommender system based on Collaborative Filtering that exploits location information gathered by indoor positioning systems. This approach allows us to provide recommendations for "extreme" cold-start users with absolutely no item interaction data available, where methods based on Matrix Factorization would not work. We simulate and evaluate our proposed system using data from the location-based FourSquare system and show that we can provide substantially better recommender accuracy results than a simple MostPopular baseline that is typically used when no interaction data is available.
Lacic Emanuel, Kowald Dominik, Eberhard Lukas, Trattner Christoph, Parra Denis, Marinho Leandro
Recent research has unveiled the importance of online social networks for improving the quality of recommender systems and encouraged the research community to investigate better ways of exploiting the social information for recommendations. To contribute to this sparse field of research, in this paper we exploit users’ interactions along three data sources (marketplace, social network and location-based) to assess their performance in a barely studied domain: recommending products and domains of interests (i.e., product categories) to people in an online marketplace environment. To that end we defined sets of content- and network-based user similarity features for each data source and studied them isolated using an user-based Collaborative Filtering (CF) approach and in combination via a hybrid recommender algorithm, to assess which one provides the best recommendation performance. Interestingly, in our experiments conducted on a rich dataset collected from SecondLife, a popular online virtual world, we found that recommenders relying on user similarity features obtained from the social network data clearly yielded the best results in terms of accuracy in case of predicting products, whereas the features obtained from the marketplace and location-based data sources also obtained very good results in case of predicting categories. This finding indicates that all three types of data sources are important and should be taken into account depending on the level of specialization of the recommendation task.