Arbeiten - Know Center - KI & Data Science

Lovric Mario

2020

Development and application of models for ecotoxicological risk assessment of bioactive chemical compound

PhD/ Dissertation

Reiter-Haas Markus

2020

Evaluation of Job Recommendations for the Studo Jobs Platform

Master

Evaluation of Job Recommendations for the Studo Jobs Platfor

Mauerhofer Thomas

2020

Using IMRaD Structure Features in Information Retrieval Raking Functions

Master

Today the internet is growing fast as users generate an increasing amount of data. Therefore, finding relevant information is getting more and more time- consuming. This happens as the internet consists of a larger amount of data that is distributed over various information sources. Search engines filter data, and reduce the time required to find relevant information. We focus on scientific literature search where search engines help to find scientific articles. An advantage of scientific articles is that they share a common structure to increase their readability. This structure is known is IMRaD (Introduction, Method, Results and Discussion). We tackle the question whether it is possible to improve the search result quality while searching for scientific works by leveraging IMRaD structure information. We use several state-of-the-art ranking algorithms, and compare them against each other in our experiments. Our results show that the importance of IMRaD chapter features depends on the complexity of the query. Finally, we focus on structured text retrieval and the influence of single chapters on the search result. We set out to tackle the problem to improve the quality of the results produced by state-of-the-art ranking algorithms for scientific literature research.

Mashkina Olena

2020

Linear Text Segmentation with Neural OIE on Novels and Subtitles

Master

Automatically separating text into coherent segments sharing the same topic is a nontrivial task in research area of Natural Language Processing. Over the course of time text segmentation approaches were improved by applying existing knowledge from various science fields including linguistics, statistics and graph theory. At the same time obtaining a corpus of textual data varying in structure and vocabulary is problematic. Currently emerging application of neural network models in Natural Language Processing shows promise, which particularly can be seen on an example of Open Information Extraction. However the influence of knowledge obtained by an Open Information Extraction system on a text segmentation task remains unknown. This thesis introduces text segmentation pipeline supported by word embeddings and Open Information Extraction. Additionally, a fictional text corpus consisting of two parts, novels and subtitles, is presented. Given a baseline text segmentation algorithm, the effect of replacing word tokens with word embeddings is examined. Consequently, neural Open Information Extraction is applied to the corpus and the information contained in the extractions is transformed into word token weighting used on top of the baseline text segmentation algorithm. The evaluation shows that application of the pipeline to the corpus increased the performance for more than a half of novels and less than a half of subtitle files in comparison to the baseline text segmentation algorithm. Similar results are observed in a preliminary step in which word tokens were substituted by their word embedding representations. Taking into account complex structural features of the corpus, this work demonstrates that text segmentation may benefit from incorporating knowledge provided by an Open Information Extraction system.

Ronacher Lisa

2020

Machine Learning-based Location Detection of Mathematical Expressions in PDF

Master

Portable Document Format (PDF) is one of the most commonly used file formats. Many current PDF viewers support copy-and-paste for ordinary text, but not for mathematical expressions, which appear frequently in scientific documents. If one were able to extract a mathematical expression and convert them into another format, such as L A TEX or MathML, the information contained in this expression would become accessible for a wide array of applications, for instance screen readers. An important step to achieve this goal is finding the precise location of mathematical expressions, since this is the only unsolved step in the formula extraction pipeline. Accurately performing this crucial step is the main objective of this thesis. Unlike previous research, we use a novel whitespace analysis technique to demarcate coherent regions within a PDF page. We then use the identified regions to compute carefully selected features from two sources: the grayscale matrix of the rendered PDF file and the list of objects within the parsed PDF file. The computed features can be used as input for various classifiers based on machine learning techniques. In our experiments we contrast four different variants of our method, where each uses a different machine learning algorithm for classification. Further, we also aim to compare our approach with three state of the art formula detectors. However, the low reproducibility of these three methods combined with logical inconsistencies in their documentation greatly complicated a faithful comparison with our method, leaving the true state of the art unclear, which warrants further research.

Bechtold Oskar

2020

AI Cruciverbalist - Artificial Intelligence (Machine Learning and Constraint Satisfaction Programming) for Grid Based Word Puzzle Generation

Master

This thesis presents a novel way of creating grid-based word puzzles, named the AI Cruciverbalist. These word puzzles have a large fan base of recreational players and are widespread in education. The puzzle creation process, an NP-hard problem, is not an effortless task, and even though some algorithms exist, manual puzzle creation achieved the best results so far. Since new technologies arose, es- pecially in the field of data science and machine learning, the time had come to evaluate new possibilities, replace existing algorithms and improve the quality and performance of puzzle generation. In particular neural networks and constraint programming were evalu- ated towards feasibility, and the results were compared. The black box of a trained model makes it hard to ensure positive results, and due to the impossibility of modelling some requirements and con- straints, neural networks are rated unsuitable for puzzle generation. The significance of correct values in puzzle fields, the approximative nature of neural networks, and the need for an extensive training set additionally make neural networks impractical. On the other hand, precisely modelling requirements for a constraint satisfaction prob- lem has shown to create excellent results, finding an exact solution, if a solution exists. The presented results achieved with the constraint programming approach are rated as successful by domain experts, and the algorithm has been successfully integrated into an existing puzzle generator software for use in production.

Leitner Lorenz

2020

Automatic Detection of Idiosyncratic Phrases as Features for Authorship Attribution

Master

People use different styles of writing according to their personalities. These dis- tinctions can be used to find out who wrote an unknown text, given some texts of known authorship. Many different parts of the texts and writing style can be used as features for this. The focus in this thesis lies on topic-agnostic phrases that are used mostly unconsciously by authors. Two methods to extract these phrases from texts of authors are proposed, which work for different types of input data. The first method uses n-gram tf-idf calculations to weight phrases while the second method detects them using sequential pattern mining algorithms. The text data set used is gathered from a source of unstructured text with a plethora of topics, the online forum called Reddit. The first of the two proposed methods achieves average F1-scores (correct author predictions) per section of the data set ranging from 0.961 to 0.92 within the same topic and from 0.817 to 0.731 when different topics were used for attribution testing. The second method scores in the range from 0.652 to 0.073, depending on configuration parameters. In current times, due to the massive amount of content creation on such platforms, using a data set like this and using features that work for authorship attribution with texts of such nature is worth exploring. Since these phrases have been shown to work for specific configurations, they can now be used as a viable option or in addition to other commonly used features.

Rutter Fabio

2020

Email Search Tool Based on Associative Memory

Bakk

A problem that came up during the last twenty-five years is the re-finding of emails. Many different groups of people have thousands of emails in their inboxes, which often causes frustration during the search for older emails. This fact is reason enough to think about new solutions for this issue. Is the continually managing of your emails with folders and labels the best answer? Or is it more efficient to use a memory-based attempt? In this thesis, we planned and implemented a search tool for Mozilla Thun- derbird to test if it is reasonable to use the human’s associative memory for re-finding. The first step was to investigate which different things, besides the conventional text and name, people potentially remember to an email. The decision fell on the separation into three additional searching features. They focus on the email partner’s primary data, on side facts to the date, and the option to search for a second email, which the user possibly associates with the wanted email. To check if the tool is applicable, we evaluated it with several test persons by giving them tasks to complete in a test email environment. The results showed a positive attitude toward these new searching ways. Especially the date-related features were rated very high. These results lead to the motivation of potentially starting further research on the topic. By discovering that dates tend to be remembered quite well, we can improve the tool in this direction before starting a large-scale evaluation with real email data.

Milchrahm Manfred

2020

Improving Browser Bookmark Practicability

Bakk

This present work elaborates on how a browser’s bookmark functionality, a common tool to aid revisitation of web pages, can be improved concerning performance and user experience. After identifying and investigating issues arising with state-of-the- art approaches, solutions to that issues were elaborated and a browser extension for the Google Chrome browser was implemented based on the gathered insight. A special focus was put on developing novel functions that allow for incorporating temporal relations between bookmarks of a given bookmark collection as well as a feature that supports searching for bookmarked web pages by colour. Ten participants completed an evaluation of the implemented browser extension in order to investigate its performance and usability. The study showed that users familiarise quickly with the proposed novel functions and rated their ease of use and helpfulness positively. However, though the suggested functions were commented positively on by participants and showed advantages over traditional full-text search for special cases where some (temporal) context is required, full-text search extended by widespread functions like autocomplete suffice for most of the basic use cases.

Tumbul Erwin

2020

Test case prioritization in limited environments

Bakk

Test case prioritization is a common approach to improve the rate of fault detection. In this scenario, we only have access to very limited data in terms of quantity and quality. The development of an useable method in such a limited environment was the focus of this thesis. For this purpose, we made use of log output and requirement information to create a cluster-based prioritization method. For evaluation, we applied the method to regressions of a device currently in development. The results indicate no impactful improvement, based on the simple and limited metrics used. To show the importance of fault knowledge, we generated a simplified dataset and applied the same prioritization method. With the now existing awareness of faults we were able to evaluate the method using a well established fault-based metric. The results of the generated dataset indicate a great improvement in the rate of fault detection. Despite the restrictions of this limited environment the implemented method is a solid foundation for future exploration.

Cody Seán Enis

2020

Topology Based Anomaly Detection in Time Series Data

Bakk

Anomaly detection on sequential time series data is a research topic of great relev- ance with a long standing history of publications. In the context of time series data, anomalies are subsequences of data that differ from the general pattern. Frequently, these specific areas represent the most interesting regions in the data, as they often correspond to the influence of external factors. Problems which conventional anomaly detection frameworks face are the limita- tion to highly domain specific applications and the requirement for pre-processing steps in order to function as intended. Through the use of the Recurrence Plot, the algorithm proposed in this thesis, initially seeks to capture the pattern of recurrence found in sequential time series data. An ensuing step for vector quantization by Growing Neural Gas ensures more efficient computation of collective anomalies. Furthermore, the usual preprocessing steps for noise removal are bypassed by the topology preservation aspects the Growing Neural Gas provides. Recurrence Plot construction is done according to a sliding window approach. The results indicate that both the noise removal by Growing Neural Gas and the pattern preservation by the Recurrence Plot, lead to highly accurate results, with the proposed Anomaly Detector finding all anomalies in a real world data set of Austria’s Power Consumption in the year 2017. Having demonstrated the applicability and potential of combining the Growing Neural Gas with the Recurrence Plot, it seems likely that these concepts could also be adapted to detect further anomalies such as contextual ones.

Gärber Daniel

2020

Classification of Wikipedia Articles using BERT in Combination with Metadata

Bakk

Wikipedia is the biggest online encyclopedia and it is continually growing. As its complexity increases, the task of assigning the appropriate categories to articles becomes more difficult for authors. In this work we used machine learning to auto- matically classify Wikipedia articles from specific categories. The classification was done using a variation of text and metadata features, including the revision history of the articles. The backbone of our classification model was a BERT model that was modified to be combined with metadata. We conducted two binary classification experiments and in each experiment compared various feature combinations. In the first experiment we used articles from the category ”Emerging technologies” and ”Biotechnology”, where the best feature combination achieved an F1 score of 91.02%. For the second experiment the ”Biotechnology” articles are exchanged with random Wikipedia articles. Here the best feature combination achieved an F1 score of 97.81%. Our work demonstrates that language models in combination with metadata pose a promising option for document classification.

Haid Florian

2020

Mobile Managementlösung für ein umfassendes echtzeitnahes Lagebild zur optimierten Koordination von Sicherheitskräften im Rahmen von unterschiedlichen Einsatzszenarien

Bakk

Bei Großveranstaltungen entsteht ein sehr hoher Managementaufwand um die Sicherheit für alle Besucher gewährleisten zu können. Es sind nämlich nicht nur private Sicherheitskräfte im Einsatz, sondern oft auch Polizei, Rettung, oder auch die Feuerwehr. Aus diesem Grund ist es sehr wichtig, dass alle beteiligten Organisationen effizient und ohne organisatorische Pro- bleme zusammenarbeiten können. Bei Notfällen, kann es durch schlechten Informationsaustausch schnell zu einer kritischen Situation kommen. Um dieses Problem zu beheben, wurde eine Managementlösung entwickelt mit der es möglich ist den Informationsaustausch zu optimieren, so dass alle Beteiligten schnellen und einfachen Zugriff auf alle Informationen haben. Das System besteht aus einer Web-Anwendung für die Einsatzzentrale, so wie einer Android-Applikation für alle mobilen Einheiten. Da es mit diesem Managementsystem nun möglich ist, dass alle Organisationen, das selbe System verwenden, können die Informationen direkt an alle zuständigen Organisationen ohne Umwege gesendet werden. Durch die Verwendung einer eigenen Android-Applikation verfügen außerdem auch alle mobilen Einsatzkräfte über die notwendigen Informationen und nicht mehr nur die Einsatzzentrale. Somit können durch den optimierten Informations- austausch zwischen allen beteiligten Organisationen, kritische Situation effizient und ohne organisatorische Problem gelöst werden. Dieses Projekt ist zwar lediglich ein Prototyp, aber es zeigt bereits sehr gut, was alles möglich ist, und wie es eingesetzt werden kann.

Wagner Mario

2020

Diversity-Aware Recommendation of Tweets

Master

Political debates today are increasingly being held online, through social media andother channels. In times of Donald Trump, the American president, who mostlyannounces his messages via Twitter, it is important to clearly separate facts fromfalsehoods. Although there is an almost infinite amount of information online, toolssuch as recommender systems, filters and search encourage the formation of so-called filter bubbles. People who have similar opinions on polarizing topics groupthemselves and block other, challenging opinions. This leads to a deterioration ofthe general debate, as false facts are difficult to disprove for these groups.With this thesis, we want to provide an approach on how to propose different opin-ions to users in order to increase the diversity of viewpoints regarding a politicaltopic. We classify users into a politic spectrum, either pro-Trump or contra-Trump,and then suggest Tweets from the other spectrum. We then measure the impact ofthis process on diversity and serendipity.Our results show that the diversity and serendipity of the recommendations can beincreased by including opinions from the other political spectrum. In doing so, wewant to contribute to improving the overall discussion and reduce the formation ofgroups that tend to be radical in extreme cases

Moser Mario

2020

Anwendung von Data-Mining Algorithmen zur Informationsgewinnung im Software Support

Master

Diese Arbeit beschäftigt sich mit der Anwendung von Data Mining-Algorithmen zur Informati-onsgewinnung im Softwaresupport. Data Mining-Algorithmen sind Tools der sogenannten „Knowledge Discovery“, der interaktiven und iterativen Entdeckung von nützlichem Wissen. Sie werden eingesetzt, um Daten zu analysieren und über statistische Modelle wertvolle In-formationen einer Domäne zu finden. Die Domäne in dieser Arbeit ist der Softwaresupport, jene Abteilung in Softwareentwicklungs-Unternehmen, die Kundinnen und Kunden bei der Lösung von Problemen unterstützt. Meist sind diese Supportabteilungen als Callcenter organisiert und arbeiten zusätzlich mit Ticketsys-temen (einem E-Mail-basierten Kommunikationssystem). Zweck dieser Arbeit ist es zu prüfen, inwiefern Data Mining-Algorithmen im Softwaresupport angewendet werden und ob tatsächlich wertvolle Informationen identifiziert werden können. Erwartet wird, Informationen über das Supportverhalten von KundInnen sowie den Einfluss von externen Faktoren wie Wetter, Feiertage und Urlaubszeiten zu entdecken. Die Literaturrecherche dieser Arbeit, beinhaltet unter anderem die Themen Personaleinsatz-planung im Softwaresupport und Data Science (Zusammenfassender Begriff für Data Mining, Data Engineering oder Data-Driven Decision Making, etc.). Im „experimental Setup“ finden Interviews zum Thema Status quo- und Kennzahlen im Softwaresupport mit führenden öster-reichischen Softwarehäusern sowie eine Fallstudie zur Anwendung eines Data Mining-Vorgehensmodells statt. Letztlich wird in einem Feldexperiment geprüft, ob es mit Data Mi-ning-Algorithmen tatsächlich möglich ist, Informationen für den Softwaresupport zu entdecken. Als Ergebnis dieser Arbeit zählen einerseits die Identifikation von Möglichkeiten, um im Sup-port Kosten zu sparen und Effizienz zu gewinnen und andererseits das Finden von wertvollen Informationen über Abläufe und Zusammenhänge im Support. Die gewonnenen Informationen können in weiterer Folge in den Supportprozess einfließen, um effektivere und effizientere Prozesse zu schaffen. Ein weiteres Resultat des Informationsgewinns ist auch die Qualitäts-steigerung von Managemententscheidungen sein

Kolb Linda

2020

Using Artificial Intelligence to Classify Activities Captured in Smart Homes

Master

Due to a rapid increase in the development of information technology, adding computing power to everyday objects has become a major discipline of computer science, known as “The Internet of Things”. Smart environments such as smart homes are a network of connected devices with sensors attached to detect what is going on inside the house and what actions can be taken automatically to assist the resident of the house. In this thesis, artificial intelligence algorithms to classify human activities of daily living (having breakfast, playing video games etc.) are investigated. The problem is a time series classification for sensor-based human activity recognition. In total, nine different standard machine learning algorithms (support vector machine, logistic regression, decision trees etc.) and three deep learning models (multilayer perceptron, long short-term neural network, convolu- tional neural network) were compared. The algorithms were trained and tested on the ucami Cup 2018 data set from sensor inputs captured in a smart lab over ten days. The data set contains sensor data from four different sources: intelligent floor, proximity, binary sensors and acceleration data from a smart watch. The mutlilayer perceptron reported a testing accuracy of 50.31%. The long short-term neural network showed an accuracy of 57.41% (+/-13.4), the convolutional neural network in 70.06% (+/-2.3) on average - resulting in only slightly higher scores than the best standard algorithm logistic regression with 65.63%. To sum up the observations of this thesis, deep learning is indeed suitable for human activity recognition. However, the convolutional neural network did not significantly outperform the best standard machine learning algorithm when using this particular data set. Unexpectedly, the long short-term neural network and the basic multilayer perceptron performed poorly. The key drawback of finding a fitting machine learning algorithm to solve a problem such as the one presented in this thesis is that there is no trivial solution. Experiments have to be conducted to empirically evaluate which technique and which hyperparameters yield the best results. Thus the results found in this thesis are valuable for other researchers to build on and develop further approaches based on the new insights.

Kuhs Stefan Claudio

2020

Where am I? Acoustic Location Classification with Temporal Lags

Master

The artificial classification of audio samples to an abstraction of the recorded location (e.g., Park, Public Square, etc.), denoted as Acoustic Scene Classification (ASC), represents an active field of research, popularized, inter alia, as part of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. Nevertheless, we are more concerned to artificially assign audio samples directly to the location of origin, i.e., to the location where the recording of the corresponding audio sample is conducted, which we denote as Acoustic Location Classification (ALC). The evidence for the feasibility of ALC contributes a supplementary challenge for acoustics-based Artificial Intelligence (AI), and enhances the capabilities of location dependent applications in terms of context-aware computing. Thus, we established a client-server infrastructure with an Android application as recording solution, and proposed a dataset which provides audio samples recorded at different locations on multiple consecutive dates. Based on this dataset, and on the dataset proposed for the DCASE 2019 ASC challenge, we evaluated the application of ALC, along with ASC, providing a special focus on constraining training and test sets temporally, and locally, respectively, to ensure reasonable generalization estimates with respect to the underlying Convolutional Neural Network (CNN). As indicated by our outcomes, employing ALC constitutes a comprehensive challenge, resulting in decent classification estimates, and hence motivates further research. However, increasing the number of samples within the proposed dataset, thus, providing daily recordings over a comparatively long period of time, e.g., several weeks or months, seems necessary to investigate the practicality and limitations of ALC to a sufficient degree.

Toth Christian

2020

Synthesizing Infomap - Ein Kullback-Leibler Divergenz-basierter Ansatz zur Erkennung von Communities

Master

Die Erkennung von Communities ist ein essenzielles Werkzeug für die Analyse von komplexen sozialen und biologischen Netzwerken, sowie von Informationsnetzwerken. Unter den bislang veröffentlichten, zahlreichen Community-Erkennungsalgorithmen ist Infomap ein prominentes und etabliertes Framework. In dieser Masterarbeit präsentieren wir eine neue Methode zur Erkennung von Communities, welche von Infomap inspiriert ist. Infomap wählt eine analytische Herangehensweise an das Community-Erkennungsproblem, indem die erwartete Beschreibungslänge eines Zufallslaufs auf einem Netzwerk minimiert wird. Im Gegensatz dazu minimiert unsere Methode die Unterschiedlichkeit, quantifiziert via Kullback-Leibler Divergenz, zwischen einem Graph-induzierten und einem synthetischen Zufallsläufer, um eine Partition in Communities zu erhalten. Daher nennen wir unsere Methode Synthesizing Infomap. Spezifischer behandeln wir Community-Erkennung in ungerichteten Netzwerken mit nicht-überlappenden Communities und zweischichtigen Hierarchien. In dieser Arbeit präsentieren wir eine Formalisierung sowie eine ausführliche Herleitung der Synthesizing Infomap Zielfunktion. Anhand der Anwendung von Synthesizing Infomap auf eine Gruppe von Standardgraphen erkunden wir dessen Eigenschaften und qualitatives Verhalten. Unsere Experimente an künstlich generierten Benchmark-Netzwerken zeigen, dass Synthesizing Infomap dessen ursprüngliche Version bezüglich „Adjusted Mutual Information“ auf Netzwerken mit schwacher Community-Struktur übertrifft. Beide Methoden zeigen gleichwertiges Verhalten bei Anwendung an einer Auswahl von realen Netzwerken. Dies indiziert, dass Synthesizing Infomap auch in praktischen Anwendungsfällen sinnvolle Ergebnisse liefert. Die vielversprechenden Resultate von Synthesizing Infomap motivieren eine weiterführende Evaluierung anhand von realen Netzwerken, sowie mögliche Erweiterungen für mehrstufige Hierarchien und überlappende Communities.

Haslinger Kevin

2020

Evolution of Software Measures - a Small-Scale Case Study Using Two Web Frameworks

Master

As the complexity of a software projects rises it can become difficult to add new features. Additionally to the maintainability, other quality attributes such as reliab- ility and usability may suffer from the increased complexity. To prevent complexity from becoming an overwhelming issue we use principles of good programming and reside to well known software architectures. We often do so, by choosing to use specific frameworks. However, we can only subjectively judge whether or not the usage of a specific framework resulted in less perceived complexity and an improvement in other quality attributes. In our work, we investigated the applicability of existing software measurements for measuring desired quality attributes and their applicability for framework com- parison. We chose a set of quantitative software measurements which are aimed at specific quality attributes, namely maintainability and flexibility. Additionally, we used well established software measurements such as McCabes Cyclomatic Com- plexity [44] and Halsteads Metrics [32] to measure the complexity of a software. By developing the same application using two different web frameworks, namely ReactJS and Laravel, over a set of predefined ‘sprints’, each containing a specific set of features, we were able to investigate the evolution of different software measurements. Our results show that some of the measurements are more applic- able to the frameworks chosen than others. Especially measurements aimed at quantitative attributes of the code such as the coupling measures by Martin [43] and the Cyclomatic Complexity by McCabe [44] proved particularly useful as there is a clear connection between the results of the measurements and attributes of the code. However, there is still the need for additional work which focuses on defining the exact scale each of the measurements operates on, as well as need for the development of tools which can be used to seamlessly integrate software measurements into existing software projects.

Jantscher Michael

2020

Big Data Analysis for Road Accident Risk Prediction in Graz

Master

Traffic accident prediction has been a hot research topic in the last decades. With the rise of Big Data, Machine Learning, Deep Learning and the real- time availability of traffic flow data, this research field becomes more and more interesting. In this thesis different data sources as traffic flow, weather, population and the crash data set from the city of Graz are collected over 3 years between 01.01.2015 and 31.12.2017. In this period 5416 accidents, which were recored by Austrian police officers, happened. Further these data sets are matched to two different spatial road networks. Beside feature engineering and the crash likelihood prediction also different imputation strategies are applied for missing values in the data sets. Especially missing value prediction for traffic flow measurements is a big topic. To tackle the imbalance class problem of crash and no-crash samples, an informative sampling strategy is applied. Once the inference model is trained, the crash likelihood for a given street link at a certain hour of the day can be estimated. Experiment results reveal the efficiency of the Gradient Boosting approach by incorporating with these data sources. Especially the different districts of Graz and street graph related features like centrality measurements and the number of road lanes play an important role. Against that, including traffic flow measurements as pointwise explanatory variables can not lead to a more accurate output accuracy.

Fussenegger Daniel

2020

BttFS - Back to the Future Search: Context-Based Ahead-Of-Time Information Retrieval

Master

The entry point of this master thesis is the context-based Web-Information- Agent Back to the Future Search (bttfs) which was developed with the goal of shortening the period of vocational adjustment while working on different projects at once as well as providing different functionalities for finding and re-finding relevant sources of information. bttfs supports the learning of a context-based user profile in two different ways. The first way is to learn the user profile by the use of a cosine-distance function applied on the Term Frequency-Inverse Document Frequency (tf-idf) document vectors and the second approach is to learn the user profile with a one-class Support Vector Machine (svm). Furthermore, the Information Retrieval methods Best Matching 25 (bm25), Term Frequency (tf), and tf-idf, are used on the created model, to determine the most relevant search queries for the user’s context. The central question answered in this thesis is stated as follows: ”Is it possible to anticipate a users future information need by exploiting the past browsing behavior regarding a defined context of information need?” To answer this question the methods above were applied to the AOL- dataset1, which is a collection of query logs, that consists of roughly 500.000 anonymous user sessions. The evaluation showed that a combination of the cosine-distance learning function and the tf weighting function yielded promising results ranging between 18.22% - 19.85% matching rate on av- erage, for the first three single word queries that appeared in advancing order on the timeline of the user actions. While the difference in perfor- mance between the cosine-distance method and the svm method appeared to be insignificant, tf and tf-idf outperformed bm25 in both of the tested scenarios. Regarding to the gained results, it can be stated, that the future information need of a particular user can be derived from prior browsing behavior in many cases, when the context of information need remained in the same context. Therefore, there are scenarios in which systems like bttfs can aid and accelerate the user’s information generation process by providing automated context-based queries.

Steyer Patrick

2020

Rebo - A feasibility study on moving from a static dialogue structure for reflection for apprentices to a conversational agent

Master

Marina Maitz

2020

Ladeinfrastruktur in der Elektromobilität – Einflussfaktoren für die Auslastung von Ladestationen

Master

Andreas Wöls

2020

Reflection on Task & Time Management: A Conversational Agent Supporting Behavioural Change

Master

Milchrahm Manfred

2020

Improving Browser Bookmark Practicability

Bakk

Silva Nelson

2020

Methods for the Creation of an Assistive and Adaptive User Interface by Performing Analysis of User's Behaviours

PhD/ Dissertation

The analysis of users’ behaviours when working with user interfaces is a complex task. It requires various sensing technologies and complex modelling of input/response relationships. A huge amount of data is collected and analysed today but there are multiple crucial factors that play an unknown role in improving human decision processes. The development of new user interfaces and the usage of suitable techniques to recognise interaction patterns, is crucial for creating adaptive systems. Our work is focused on fault tolerance of Human Machine Interfaces and we develop systems that accept physical user measurements as additional inputs. This can be used to create assistive and adaptive user interfaces and as a way to improve recommendations.

Know Center Research

Wissenschaftliche Arbeiten