Arbeiten - Know Center - KI & Data Science

Fernández Alonso, Miguel Yuste

2018

Mining Frequent Patterns in Environmental Sensor Data

Bakk

The advances in data science provide us with a vast array of tools to analyse and better understand our environment. Of special interest to us is the topic of sequential pattern mining, in which statistic patterns are found within sequences of discrete data. In this work, we review some of the major techniques currently offered by the pattern mining field. We also develop a proof of concept tool for frequent itemset mining in Tinkerforge sensor data, showing how the application of the FP-Growth algorithm to Tinkerforge sensor data can provide valuable observations and offer an inexpensive yet powerful setting for further knowledge discovery processes. Lastly, we discuss some of the possible future lines of development of the presented problem.

Polz Hans Georg

2018

Is Google’s Wisdom-Of-The-Crowd a Valid Approach to Discerning Truth in the Age of Fake News

Bakk

Fake News and misinformation are widely discussed topics in our modern information society. A multitude approaches have been taken to filter out false information, ranging from manual research to artificial intelligence. Most of these projects, however, focus on the English language. To fill this gap, we introduce Crowd Fact Finder, a fact-checking tool for German language text, which uses Google search results alongside Open Information Extraction to distinguish fact from fake. We use a wisdom-of-the-crowd approach, deciding that what is popular opinion must be the truth. Crowd Fact Checker is based on the idea that true statements, as a search engine query, will produce more results related to the query than untrue statements. Crowd Fact Checker was evaluated in different categories, achieving an accuracy of 0.633 overall, and 0.7 when categorizing news. The informative value of wisdom-of-the-crowd depends strongly on the popularity of the discussed topic than its validity.

Leitner Lorenz

2018

Implementation and Evaluation of a Bookmark and History Content Search Browser Add-on

Bakk

Schiestl Andreas

2018

Businesssuite, an Affordable and Secure Toolsuite for Managing Customer Data and Invoices

Bakk

Since the new regulations of 2016, nearly all businesses in Austria are required to manage their invoices digitally and hand out digitally signed receipts. Existing solutions are mostly aimed at bigger companies or lack in usability and performance. In this paper, we describe a modern platform independent application to manage invoices, customers and room bookings. This was implemented using state of the art techniques to create a web application built on the Grails framework. Aimed at being deployed as system as a service, the application makes use of a hybrid multi tenancy database concept which allows many customers on a single server without compromising data security. Due to its responsive design, the application can be used on devices of nearly all screen sizes with little compromises. The system is nearly production ready and is already used in a productive environment by one customer. By fully integrating the invoice component with the hotel component, our application achieves great performance when billing hotel rooms. As soon as the system is fully production ready, it will offer small and medium sized enterprises a modern and affordable solution for digitally managing their invoices and room bookings in full compliance with the law.

Lackner Patrick

2018

Computing Cluster for Big Data Analysis

Bakk

The goal of this thesis was to test if a raspberry pi cluster is suitable for big data analysis. The frameworks Hadoop and Spark were used. For clarification, if the raspberry pi cluster is a good choice for big data analysis, the same calculations were tested on a reference laptop. The tested test programs were programed in Java for Hadoop and in Scala for Spark. The files were stored on Hadoops distributed file system. The test programs tried to address strengths and weaknesses of the frameworks and ranged from simple data analysis to the random forest machine learning algorithm. At last, the resource usages of the frameworks and the distributed file system were monitored. The raspberry pi cluster was faster with the test programs for Spark, if they worked on the cluster, because many of Sparks features were not usable on the cluster. Map Reduce worked fine on the cluster, but the reference laptop clearly outperformed the cluster for this test programs. The test programs for Spark were except in one case faster than the test programs for Map Reduce.

Friedrich Matthias

2018

Businesssuite, an Affordable and Secure Toolsuite for Managing Customer Data and Invoices

Bakk

Since the new regulations of 2016, nearly all businesses in Austria are required to manage their invoices digitally and hand out digitally signed receipts. Existing solutions are mostly aimed at bigger companies or lack in usability and performance. In this paper, we describe a modern platform independent application to manage invoices, customers and room bookings. This was implemented using state of the art techniques to create a web application built on the Grails framework. Aimed at being deployed as system as a service, the application makes use of a hybrid multi tenancy database concept which allows many customers on a single server without compromising data security. Due to its responsive design, the application can be used on devices of nearly all screen sizes with little compromises. The system is nearly production ready and is already used in a productive environment by one customer. By fully integrating the invoice component with the hotel component, our application achieves great performance when billing hotel rooms. As soon as the system is fully production ready, it will offer small and medium sized enterprises a modern and affordable solution for digitally managing their invoices and room bookings in full compliance with the law.

Schaffer Robert

2018

Evaluation of Vote/Veto Classifier

Bakk

Authorship identification techniques are used to determine whether a document or text was written by a specific author or not. This includes discovering the rightful author from a finite list of authors for a previously unseen text or to verify if a text was written by a specific author. As digital media continues to get more important every day these techniques need to be also applied to shorter texts like emails, newsgroup posts, social media entries, forum posts and other forms of text. Especially because of the anonymity of the Internet this has become an important task. The existing Vote/Veto framework evaluated in this thesis is a system for authorship identification. The evaluation covers experiments to find reasonable settings for the framework and of course all tests to determine the accuracy and runtime of it. The same tests for accuracy and runtime have been carried out by a number of inbuilt classifiers of the existing software Weka to compare the results. All results have been written to tables and were compared to each other. In terms of accuracy Vote/Veto mostly delivered better results than Weka’s inbuilt classifiers even though the runtime was longer and more memory was necessary. Some settings provided good accuracy results with reasonable runtimes.

Leitgeb Martin

2018

GPS Car Insurance: Assumptions versus Facts

Bakk

In recent years, the variety of car insurance models rose increasingly. Including the range of GPS supported contracts that observe the driving behavior of the insured, assisted by GPS locators, and transfer them to the insurance company. By analyzing the data, the insurance companies try to create a profile of the policyholder and to adjust the insurance fee to the respective driving behavior such as speeding, breaking, turn speeds and much more. However, this calculation assumes that people who spend more time in cars are automatically more vulnerable to accidents and small damages. They assume that there is a direct correlation between time spent in the car and the risk of an accident. Here, however, it was forgotten that experience plays a very important role. The more time you spend driving, the more experience you have gained with hazards or problem situations. The handling of the vehicle itself is best learned by experience and thus reduces the chance of parking damage or similar. The aim of the thesis is to verify or disproof the current approach of insurance companies. To this end, several methods are used to combine multiple perspectives on the topic as possible. In addition to a survey, data is automatically collected by means of web scraping and also manually by means of several random sampling tests. After evaluating the data quality, the results obtained are summarized and evaluated. In addition to statistical evaluations in PSPP, the focus is also on logical or obvious relationships. Finally, all aspects are merged and the underlying assumption was mostly refuted as studies showed that people driving regularly also have the highest percentage of accidents. But this group of drivers also shows the most stable and predictable values while people driving irregularly show much bigger irregularities. Most surveillants stood up against permanent monitoring of driving habits including all types of test groups. During the data collection of the thesisit had to be stated that web scapping of RSS Feeds provides very little usable data.

Breuß Mathias

2018

Favoret Bookmarks: A Browser Extension Support My Users to Get Information Fast and Targeted

Bakk

Bruchmann Andreas

2018

Privacy Protection via Pseudo Relevance Feedback

Bakk

Resch Sebastian

2018

Implementation and Evaluation of a Bookmark and History Content Search Browser Add-on

Bakk

Schlacher Jan Peter

2018

Neo4-js: Object-Graph Mapping with Typed JavaScript and Neo4j

Bakk

In this thesis I present a novel object graph mapper for Neo4j written in modern statically typed JavaScript. The aim of this library, namely neo4-js, is to reduce the code size while still preserving readability, maintainability and code quality when writing backend applications and communicating with a Neo4j database in JavaScript. Readability is a key factor for maintainable code. Hence neo4-js provides a declarative and natural way of defining a data scheme. Better code quality is reached by supporting the developer with good error messages and providing a well tested library. Furthermore, neo4-js fully supports Flow type definitions to be able to find type errors without running the code itself, resulting in better code quality. Neo4-js is specifically targeted for backend JavaScript applications running on Node.js. With the basic neo4-js library it is possible to reduce the code size by up to 1200%. Additionally, I will discuss an effective way of test driven development for database libraries written in JavaScript with a Docker container. Finally, we will have a look at a new way of expressing a schema definition with a new schema definition language and its own compiler to reduce the code size more.

Know Center Research

Wissenschaftliche Arbeiten