Rebol Manuel
2017
Automatic Classification of Business Intent on Social Platforms
Bakk
People spend hours on social media and similar web platforms each day.
They express a lot of their feelings and desires in the texts which they
post online. Data analysts always try to find clever ways to get use of this
information.
The aim of this thesis is to first detect business intent in the different types
of information users post on the internet. In a second step, the identified
business intent is grouped into the two classes: buyers and sellers. This
supports the idea of linking the two groups.
Machine learning algorithms are used for classification. All the necessary
data, which is needed to train the classifiers is retrieved and preprocessed
using a Python tool which was developed. The data was taken from the web
platforms Twitter and HolidayCheck.
Results show that classification works accurately when focusing on a specific
platform and domain. On Twitter 96 % of test data is classified correctly
whereas on HolidayCheck the degree of accuracy reaches 67 %. When con-
sidering cross-platform multiclass classification, the scores drop to 50 %.
Although individual scores increase up to 95 % when performing binary
classification, the findings suggest that features need to be improved fur-
ther in order to achieve acceptable accuracy for cross-platform multiclass
classification.
The challenge for future work is to fully link buyers and sellers automatically.
This would create business opportunities without the need of parties to
know about each other beforehand.
Valentan Stephan
2017
How Design Patterns Impact Code Quality: A Controlled Experiment
Bakk
While design patterns are proposed as a standard way to achieve good
software design little research is done on the actual impact of using these
strategies on the code quality. Many books suggest that such methods
increase flexibility and maintainability however they often lack any evi-
dence.
This bachelor thesis intends to empirically demonstrate that the use of
design patterns actually improves code quality.
To gather data about the code two applications were implemented, that
are designed to meet the same requirements. While one application is
developed following widespread guidelines and principles proposed by
the object oriented programming, the other is implemented without paying
attention to the topics of software maintenance. After complying to the basic
requirements a number of additional features were implemented in two
phases. At first a new graphical user interface is being supported, then a
different data tier is added.
The results show that the initial effort of implementing the program version
following object oriented programming guidelines are noticeably higher in
terms of code lines and necessary files. However, during the implementation
of additional features fewer files needed to be modified and during one
phase transition considerably less code was needed to be written while not
performing worse in the other and furthermore the cyclomatic complexity
of the code increased less rapid.
Suppan Johannes
2017
Semantischer RESTFul Web Service für die Visualisierung und Verwaltung von automotiven Entwicklungs-Tätigkeiten in einem Informations-Cockpit
Bakk
Product development starts with the product requirements. If these are
defined, solutions are created for the individual components, which then
correspond to the entire product requirements. The process of solution
approaches and solution refinement is operated in many iterations until a
corresponding quality of the product requirements is achieved. This entire
”knowledge process “is to be transferred into a knowledge management.
This is why we are showing ways to make new information technologies of
Web 2.0 usable for knowledge management in the automotive industry.
It is based on a research project of the Virtual Vehicle Competence Center,
which includes a software prototype (”information cockpit “). ”The information
cockpit “links both the product requirements and development tasks
with the project organization. Thus a Product Data Management (PDM)
as well as a Requirement Management System (RQM) is mapped. The networking
has succeeded in uniting the individual systems, which represents
a novelty in this area. By networking the product data, request data and
project organization, the user is able to obtain a quick overview of different
data in the automotive development. As a result, the management as well
as the design is able to use existing knowledge quickly and to provide
newly generated knowledge for others in an unconventional manner. At
present only the visualization is implemented. The data to be used are made
available by ”Link-Nodes “from the data system.
The goal is to transfer the demonstrator to the application ”information
cockpit “. The ontology PROTARES (PROject TAsks RESources) is used
here as a basis. This ontology includes the entire data schema. A semanitc
representation-based transfer (REST) Ful Web Service was designed and
implemented accordingly. The data storage layer is a triple-store database. ”The information cockpit “can be used to query the system, which graphically
and structurally displays the information to the user. Through the use
of these technologies it was possible to create a modular whole system for
the system architecture.
In the near future, data management can be tackled, not just visualization,
but also changing the data. After that, you can still think about user
administration, access control, and so on.
Veigl Robert
2017
Multiplatform Mobile App for Data Acquisition from External Sensors
Bakk
Mobile apps become more and more important for companies, because apps
are needed to sell or operate their products. For being able to serve a wide
range of customers, apps must be available for the most common platforms,
at least Android and iOS. Considering Windows Phones as well, a company
would need to provide three identical apps - one for each platform. As each
platform comes with their own tools for app development, the apps must
be implemented separately. That means development costs may raise by a
factor of three in worst case.
The Qt framework promises multi platform ability. This means an app needs
to be implemented just once but still runs on several platforms.
This bachelor’s thesis shall prove that by developing such a multi platform
app using the Qt framework. The app shall be able to collect data from
sensors connected to the mobile device and store the retrieved data on
the phone. For the proof the supported platforms are limited to the most
common ones - Android and iOS. Using this app for recording data from a
real life scenario demonstrates its proper functioning.
Frank Sarah
2017
Automatic Generation of Regular Expressions
Bakk
This thesis deals with the creation of regular expressions from a list of input
that should match the resulting expression. Since regular expressions match
a pattern, they can be used to speed up work that includes large amounts
of data, under the assumption that the user knows some examples of the
pattern that should be matched.
In the herein discussed program, a regular expression was created iteratively
by working away from a very rudimentary regular expression, allowing
for an adjustment of a threshold to mitigate the effect of not having any
negative matches as input.
The result is an easy creation of a sufficiently well-working regular expression,
assuming a representative collection of input strings while requiring
no negative examples from the user.
Kuhs Stefan Claudio
2017
DART: The Dataset Anonymization and Replication Tool
Bakk
Due to persistent issues concerning sensitive information, when working with
big data, we present a new approach of generating articial data1in the form
of datasets. For this purpose, we specify the term dataset to represent a
UNIX directory structure, consisting of various les and folders.
Especially in computer science, there exists a distinct need for data. Mostly,
this data already exists, but contains sensitive information. Thus, such critical
data is supposed to stay protected against third parties. Hence, this
reservation of data leads to a lack of available data for open source developers
as well as for researchers.
Therefore, we discovered a way to produce replicated datasets, given an origin
dataset as input. Such replicated datasets represent the origin dataset as
accurate as possible, without leaking any sensitive information.
Thus, we introduce the Dataset Anonymization and Replication Tool, short
DART, a Python based framework, which allows the replication of datasets.
Since we aim to encourage the data science community to participate in our
work, we constructed DART as a framework with high degree of adaptability
and extensibility.
We started with the analysis of datasets and various le and MIME types
to nd suitable properties which characterize datasets. Thus, we dened
a broad range of properties, respectively characteristics, initiating with the
number of les, to the point of le specic characteristics like permissions. In
the next step, we explored several mathematical and statistical approaches
to replicate the selected characteristics. Therefore, we chose to model characteristics
using relative frequency distributions, respectively unigrams, discrete
as well as continuous random variables. Finally, we started to produce replicated datasets and analyzed the replicated characteristics against the
characteristics of the corresponding origin dataset. Thus, the comparison
between origin and replicated datasets is exclusively based on the selected
characteristics.
The achieved results highly depend on the origin dataset as well as on the
characteristics of interest. Thus, origin datasets, which indicate a simple
structure, tend more likely to deliver utilizable results. Otherwise, large and
complex origin datasets might struggle to be replicated succiently. Nevertheless,
the results aspire that tools like DART will be utilized to provide
articial data1for persistent use cases.
Kurzmann Lukas
2017
Data Mining - Variables and Feature Selection with Greedy and Non Greedy Algorithm
Bakk
This paper is about comparing variables and feature selection with greedy and non greedy algorithms. For the greedy solution the ID3 [J. Quinlan, 1986] algorithm is used in this paper, which serves as a baseline. This algorithm is fast and provides good results for smaller datasets. However if the dataset gets larger and the information, which we want to get out of it has to be more precise, several combinations should be checked. Therefore a non greedy solution is a possible way to achieve that goal. This way of getting information out of data tries every possibility/combination to get the optimal results. This results may contain combinations of variables. One variable on its own possibly provides no information about the dataset, but in combination with another variable it does. That is one reason, why it is useful to check every combination. Besides the precision, which is very good, the algorithm needs higher computational time, at least W(n!). The higher the amount of attributes in a dataset is the higher the computational complexity is. The results have shown, even for smaller datasets that the non greedy algorithm finds more precise results, especially in view of combination of several attributes/variables. Taken together, if the dataset needs to be analysed in a more precise way and the hardware allows it, then the non greedy version of the algorithm is a tool, which provides precise data especially at combinational point of view.