In this thesis, we present a system to recognise natural appearing gestures using a self build smartglove prototype. We explain the nature of gestures and the anatomy of the human arm and go into the theory of gesture recognition. A user study is used as a basis of a data-driven approach to gesture recognition, where all possible features from human activity recognition are generated, and automatic methods to select a good set of features are explored. We extend this approach even further with a novel algorithm for selecting sensors for a specific target system. Recursive Sensor Elimination (RSE) selects sensors recursively using a heuristic function to find the best configuration for a given subset of gestures. We explain the use cases, the detail of the RSE algorithm and first experimental results. It shows the problems when someone tries to apply the insights of this work to consumer hardware in the form of a smartwatch experiment and which design decision have to be made. Within this experiment, it presents a possible method to augment IMU time series data if the labels are not corrupted by speeding up or slowing down the time series and adding some noise. With this, it is possible to train a simple system to allow steering f.e. a slide set with your watch.

Propaganda is one of the biggest problems in the modern world because it provokes conflicts which can lead to a great loss of human life. The annexation of Crimea and following conflict in Eastern Ukraine is a prime example of it. This conflict lead to thousands of lost lives and millions of displaced people. The lack of research on the topic of unsupervised propaganda detection led us to devise methods for analysing propaganda that does not rely on fact checking or makes use of a dedicated ground truth. Instead, we base our measures on a set of guiding principles that constitutes the intention of an propagandist authors. For each of these principles we propose techniques from the fields of Natural Language Processing and Machine Learning. We have chosen the Russian military intervention in Ukraine as our focus, and the Russian News and Information Agency as our data source. We found the representation of Ukraine to be remarkably different to other countries, hinting that the principles of propaganda might be applicable in this case. Our quantitative analysis paves the way to more in-depth qualitative analysis.

With the increasing development of technology nowadays a diverse number of possibilities have arisen but new challenges come into play too. These developments have made it possible to move towards Industry 4.0 and the so-called Smart Factories. It is the new manufacturing system where everything is supposed to be connected. This can have a big impact like in supporting decision making, in shortening the production life-cycle or in enabling highly customizable product manufacturing, which can be achieved by making use of the right data. The data that flows within a Smart Factory can be of an enormous volume, is heterogeneous and they do not come only from a single data source. However, the systems have to bring the created data into play somehow. The challenge here is to transform the created Big Data to the more valuable Smart Data, so that later in the process, analytics like Predictive Maintenance or Retrospective Analysis can be performed successfully on those data. This is also the aim of this Master’s Thesis. In order to solve this problem, a prototype service called Smart Data Service has been developed so that the raw incoming data streams are aggregated and put together in a more reduced but valuable format, known as Smart Data. For the testing purposes and the evaluation of the work, it was necessary to additionally develop a Smart Factory Simulator, which is supposed to emulate different scenarios of a manufacturing setup. Two use cases have been taken into consideration for evaluating the Smart Data Service - aggregating data that would be useful for applying Retrospective Analysis and aggregating data that would be useful for Predictive Maintenance. Finally, the results show that the aggregated Smart Data can have considerable value for performing Retrospective Analysis as well as Predictive Maintenance.

The modern economy heavily relies on data as a resource for advancement and growth. A huge amount of data is produced continuously, and only a fragment of the amount is handled properly and efficiently. Data marketplaces are increasingly gaining attention. They provide possibilities to exchange, trade and access different kinds of datasets across organizations, between interested data providers and data buyers. Data marketplaces need stable and efficient infrastructure for their operations, and a suitable business model in order to provide and gain value. Due to the rapid development of the field, and its recent high increase in popularity, the research on business models of data marketplaces is fragmented. This thesis aims to address the issue by identifying dimensions and characteristics of data marketplaces, which outline the characteristics of their business models. Following a rigorous process for taxonomy building, a business model taxonomy for data marketplaces is proposed. Using the evidence from a final sample of twenty available data marketplaces, the frequency of characteristics of data marketplaces is analyzed. In addition, four data marketplace business model archetypes are identified. The findings reveal the impact of the structure of data marketplaces as well as the relevance of infrastructure, regulations and security issues handling for identified business model archetypes. Therefore, this study contributes to the growing body of literature on digital business strategies.

Die Automobilindustrie erfährt aufgrund technologischer Entwicklungen, wie zum Beispiel dem autonomen Fahren oder der Elektrifizierung des Antriebsstranges, bedeutende Veränderungen. Einhergehend mit diesen Veränderungen, ist ein deutliches Wachstum generierter Daten, welche in sämtlichen Phasen der Automobilen Wertschöpfungskette erzeugt werden. Ziel vieler Unternehmen ist es, diese zur Verfügung stehenden Daten, wirtschaftlich zu verwerten. Die zwei bedeutendsten Möglichkeiten hierfür sind die datenbasierte Umsatzsteigerung, welche beispielsweise den Verkauf von Daten oder das Angebot von datenbasierten Services, beinhaltet, und die Kostenreduktion basierend auf dem Wissen, welches mittels vorhandener Daten generiert wird. Das große ökonomische Potential, welches von diversen Unternehmungen und Institutionen, darunter auch McKinsey (2016c, p.7ff), vorhergesagt wird, ruft Unternehmen aus verschiedenen Geschäftsbereichen auf den Plan, in diesem Bereich tätig zu werden. Neben den konventionellen Unternehmen in der Automobilindustrie, wie OEMs und Entwicklungsdienstleistern, versuchen neue Marktteilnehmer wie zum Beispiel IT-Unternehmen und Start-ups, im Datengeschäft der Automobilindustrie, Fuß zu fassen. Ziel dieser Arbeit ist es, eine Auswahl an, für die AVL relevanten, Entwicklungsdienstleistern, IT-Unternehmen und Start-ups zu identifizieren, diese auf ihr Marktangebot an datenbasierten Dienstleistungen, Produkten, Plattformen und anderen datenbasierten Aktivitäten, wie etwa Forschung, Kooperationen oder Firmenübernahmen, zu analysieren und die Ergebnisse zu interpretieren. Die Bestimmung der zu analysierenden Unternehmen basiert auf Rankings welche die umsatzstärksten Entwicklungsdienstleister in der Automobilindustrie sowie die umsatzstärksten IT-Unternehmen in der deutschen Automobilindustrie identifiziert. Relevante Start-ups wurden mit Hilfe einer Start-up Abfrage des Unternehmens Innospot bestimmt. Unternehmen dieser drei Unternehmensgruppen wurden auf Basis der öffentlich verfügbaren Informationen analysiert. Relevante Informationen bezüglich datenbasierter Dienstleistungen, Produkte und anderer datenbasierten Aktivitäten wurden unter Verwendung von Clustern kategorisiert und mit zusätzlichen Informationen aufgenommen. In dieser Arbeit kann ein Cluster als Themengebiet verstanden werden, wie zum Beispiel „Autonomes Fahren“ oder „Testen“. Die Auswertung der durch die Analyse gewonnen Daten, führte zu einer Vielzahl an Ergebnissen. Durch die Methode des Clusterns, wurden die Aktivitätsbereiche der Unternehmen, sowie jene Bereiche, in denen keine Aktivität festgestellt wurde, ermittelt. Eine Gegenüberstellung der Aktivitätsbereiche der analysierten Unternehmen mit jenen der AVL, identifiziert Unternehmen nach ihrer Cluster-Übereinstimmung mit der AVL. Jene Cluster, in denen keine Aktivität der AVL festgestellt werden konnte, wurden einer eigenen Analyse unterzogen, um Unternehmen zu identifizieren, welche in diesen Bereichen aktiv sind. Eine separate Analyse zeigt die Aktivität der analysierten Unternehmensgruppen in den Phasen der Automobilen Wertschöpfungskette. Entwicklungsdienstleister sind in den Phasen Entwicklung, Validierung, Produktion und Aftersales aktiv. Der Schwerpunkt der IT-Unternehmen liegt im Bereich der Produktion und des Aftersales. Start-ups legen ihren Fokus hauptsächlich auf den Aftersales Bereich. Diese Arbeit beschäftigt sich auch mit der Frage, ob Entwicklungsdienstleister und IT-Unternehmen an denselben datenbasierten Themen arbeiten oder ob eine klare Differenzierung möglich ist. Um diese Frage zu beantworten, wurde eine Competitive Landscape erstellt, welche die gegenwärtige Position von zuvor definierten Entwicklungsdienstleistern, IT-Unternehmen und Start-ups darstellt. Speziell größere Entwicklungsdienstleister, welche in vielen Clustern aktiv sind, sind vermehrt auch in IT-Bereichen tätig.

The subject area of automated Information Extraction from PDF documents is of high relevance since the PDF standard is still one of the most popular document formats for information representation and exchange. There is no structuring blue- print for PDF documents, which makes automated information gathering a complex task. Since tables are structuring elements with a very high information density, the field of Table Detection is highly relevant in the context of Information Extraction. Due to the high variety of formats and layouts it is hard to choose the correct tool that suits optimally for every specific scenario. In this thesis, the added value of techniques used to identify table structures in scanned PDF documents is evaluated. Therefore, two algorithms were implemented to allow an objective comparison of Table Extraction applied on different types of PDF documents. While the algorithm developed to treat native PDFs is based on heuristics, the second approach relies on deep-learning techniques. The evaluation of both implementations showed that the heuristic approach performs excellent in detecting tables. However, it shows weaknesses in distinguishing non-tabular areas that show similarities to table struc- tures, from tabular areas. Therefore, the Recall metric shows better results than the Precision for the heuristic method. When applying Table Detection on scanned PDFs using the second approach, the low number of False Positives and therefore the superior Precision value compared to the first approach is notable. On the other hand the number of tables not detected as trade-off for the high Precision result in a lower Recall for single- as well as multi-column documents if partial detections are classified as correct results. Furthermore, limitations that reduce the detection-ratio were detected. This concerns structures that share similarities with tables, like figures, formulas and pseudo-code. These mentioned limitations are particularly relevant for the heuristic and less for the deep-learning based approach. All in all, there were several findings concerning advantages and disadvantages of applying Table Detection on scanned and native documents. Based on the evaluation results, strategies were elaborated of when to preferably use a specific approach dependent upon the document type, layout and structuring elements.

Prognosen in heutigen Lieferketten sind von immer mehr Einflussfaktoren abhängig und deshalb wird es immer schwieriger, die Laufzeiten vorherzusagen. Aus diesem Grund müssen oft externe Systeme abgefragt werden, was in der Regel ressourcenintensiv ist. Ziele dieser Arbeit sind die Entwicklung und Einführung eines Entscheidungsbaumes, um die direkte Abhängigkeit von externen Services zu eliminieren und die Vorhersa- ge anhand von historischen Daten durchzuführen. Über einen Datengenerator können synthetische aber auch konstante Testdaten erzeugt und somit die Performance des ent- wickelten Entscheidungsbaumes getestet werden. Der Baum selbst unterscheidet zwischen Entscheidungsfragen und manuellen Fragen. Entscheidungsfragen werden vollständig in der Lernphase anhand der Parameter-Objekte definiert, wohingegen manuelle Fragen vorab programmiert werden. Eine Entscheidungs- findung basiert auf der Grundlage, dass so wenig Ebenen wie möglich erzeugt werden. Die Vereinfachung des Baumes wird anhand von mathematischen Operationen bzw. statis- tischen Werkzeugen, wie dem Ignorieren von unwahrscheinlichen Ergebnissen, erreicht. In dieser Arbeit wird gezeigt, dass es möglich ist, eine NoSQL Datenbank für das Spei- chern von Entscheidungsmodellen zu verwenden. Darüber hinaus kann aufgezeigt wer- den, dass die Vorhersage des Zustelldatums in einem Online-Shop mittels Entscheidungs- baum möglich ist.

In order to provide accurate statistics and information on how much work was published by institutes and researchers, Graz University of Technology uses a com- mercial research management system called PURE. The university would like to have all work which was published by its institutes and researchers registered to this system. However, registering older publications to this system is a daunting task be- cause missing meta-information has to be entered manually. The project behind this thesis was to develop an application which makes the import of meta-information provided by other research portals into this system easier. This problem had to be tackled by the development of smart algorithms to infer missing meta-information, and an user-interface which supports the definition of default values for informa- tion where no inference is possible. Those tasks involved working with public and private API’s, parsing and generating large XML-files and the implementation of an architecture which supports multiple different sources for meta-information on publications. The development of this application was successful and the generation of XML for a bulk import of meta-information from another research portal called DBLP is now possible. The application is easily extensible in respect to the addition of other research portals and provides versatile settings to adjust the generation of import-XML more specifically. Users with administrative access to the PURE server of the university can now select publications from supported research portals and generate large XML-files for a bulk import of meta-information. Only a long- term field test of this application will show whether or not the problem has been completely solved by this work.

In automatised warehouses often unwanted situations, which are called problems, occur. In this bachelor’s thesis, a system component which col- lects information about these problems and offers solutions to overcome these was developed. This component was integrated into an existing ware- house management system. Out of ten common problematic scenarios, 26 requirements which define functional and non-functional attributes of the desired system component have been worked out. From process details like recognition of problems, the definition of problems and their solutions and handling of these by users are covered in this thesis. Then, a chosen set of demands was implemented in a proof-of-concept solution. Additionally, the introduced scenarios were implemented in a demonstration warehouse. In the provided framework, the implemented scenarios can be observed and handled by users. Handling problems is more than 68 per cent faster using this framework. Even though adding new problems to handle is not simple and the calculations made are very time-consuming, this thesis offers a big first step from a user-guided system to a system-guided user.

Maschinelles Lernen ist weit verbreitet auf dem Gebiet der kondensierten Materie, besonders im Zusammenhang mit traditionellen quantenmechanischen Methoden, wie zum Beispiel der Dichtefunktionaltheorie (DFT). Eine mogliche Anwendung ist das Erlernen der Potentialhyper ache von Festkorpern zur Vorhersage von Kristallstrukturen. Im Allgemeinen ist die Ezienz und Genauigkeit des maschinellen Lernens abhangig von den verfugbaren Daten, dem Lernalgorithmus und der Datendarstellung. Die Datendarstellung ist notwendig um relevante Informationen uber das System quantitativ zu erfassen, sodass diese vom Lernalgorithmus verarbeitbar sind. In dieser Arbeit wenden wir unterschiedliche Methoden des maschinellen Lernens an, um die inneren Energien von polymorphen mono-elementaren Kristallstrukturen aus Kohlensto und Bor zu erlernen, die zuvor durch Kistallstruktur-Vorhersagen erzeugt wurden. Wir untersuchen unterschiedliche Lernalgorithmen und entwickeln eine physikalisch-motivierte Datendarstellung, welche die Kristallstruktur beschreibt. Wir optimieren und evaluieren die Leistung der Lernalgorithmen an Datensatzen, die relaxierte und gemischte, d.h. relaxierte und unrelaxierte, Kristallstrukturen beinhalten. Unsere Ergebnisse zeigen, dass Kernel-basierende Regressionsverfahren mit der entwickelten Datendarstellung genaue Vorhersagen von Energien gemischter Kristallstrukturen liefern, die mit quantenmechanischen Methoden vergleichbar sind. Mit einem ermittelten mittleren absoluten Fehler (MAE) von ungefahr 10 meV / Atom konnte die entwickelte Methode teure Berechnungen ersetzen, die in kostenintensiven Vorhersagen von Kristallstrukturen benotigt werden

Thermal processes in the manufacturing industry involve highly optimized equipment for production. In order to run the process the equipment has to be maintained, replaced and adjusted in their settings regularly. This requires a certain amount of effort, concerning the economic and timely aspects. The goal of this thesis was to purpose an approach for further improvement of the equipment efficiency, based on data-driven methods. Initially historic product and process data had been collected, mapped and pre-processed. In order to train selected machine learning algorithms features had been engineered and extracted. To ensure the state of the equipment can be represented through the available data, several models had been trained and evaluated. The presented heuristic approach dealt with the quality of the collected data and included a predictive maintenance model. This model further was analyzed to identify the influencing parameters on the lifespan of the equipment. Besides the prediction of maintenance actions, a proposal to optimize the utilization of the equipment had been presented. Based on the knowledge that the state of the equipment can be represented with the according techniques, there seems to be potential for further improvement in the processes through data-driven models.

Dramatic tragedies at major events in recent years with many deaths have shown how important it is to develop a security solution to prevent such catastrophes. In the context of this master thesis, a development concept for a mobile multisensor solution was developed, tested and evaluated to support safety and risk tasks at major events. After a detailed hardware research, a first prototype was developed, which was tested at the Frequency Festival in St. Pölten. The impressions and results from this test were evaluated and then a second prototype was developed, tested and subsequently evaluated. In addition to the detailed research of the various hardware components, Global Positioning System (GPS) and Inertial Measurement Unit (IMU) accuracy tests were conducted between professional sensors and smartphone sensors. Finally, a ready-to-use mobile multi-sensor solution was developed to support security and risk issues at major events designed to help security personnel in security tasks at urban locations and major events, thereby avoiding potentially dramatic tragedies.

Informelles Lernen ist der Schlüssel zur Lösung unklar definierter Probleme im englischen Gesundheitswesen, wie etwa der Umsetzung offizieller Empfehlungen in der Praxis. Allerdings hindert der stressige Arbeitsalltag die interdisziplinäre Praxisgemeinschaft ihre Erfahrungen aufzuarbeiten und gemeinsam den besten Lösungsweg auszuhandeln. Die Entwicklung unterstützender Tools bedarf eines Verständnisses der kognitiven Prozesse von Sense und Meaning Making im Erfahrungslernen, welche bisher aber nur in formellen Lernkontexten oder ohne Einbezug von Erfahrungen am Arbeitsplatz untersucht wurden. Zur Untersuchung dieser kognitiven Prozesse im Rahmen des informellen Lernens am Arbeitsplatz und gleichzeitiger Entwicklung technischer Unterstützung habe ich Design-based Research ausgewählt und eine systematische Methode zum kollaborativen Design von Tools ersonnen. Die Methode stellt die Praxis in den Mittelpunkt, leitet die Analyse der Appropriation von latenten Handlungsoptionen an und zielt auf reproduzierbare kreuzvalidierte Forschungseinsichten über Domäne, kognitive Theorien und Design ab. Durch die Einbindung der End-AnwenderInnen wird eine hohe Praxisrelevanz und Akzeptanz des designten Tools sichergestellt. Nach einer Ermittlung des praktischen, technischen und theoretischen Standes der Forschung wurde durch das kollaborative Design und die Analyse der Appropriation von Papier- bis hin zu Softwareprototypen in acht Iterationen das „Bits & Pieces“ Tool entwickelt. Parallel hat dieser Prozess zum Verständnis der Arbeits- und Lernpraxis im englischen Gesundheitswesen sowie einem kognitiven Modell von Sensemaking, Meaning Making und interdisziplinärer Teamarbeit im informellen Lernen geführt. Die Ergebnisse können in zukünftigen Forschungsvorhaben und in der Entwicklung von Lerntechnologien verwendet werden. Weiters hat die Studie zur Erhöhung der digitalen Kompetenz der teilnehmenden ExpertInnen geführt, was auch zur eigenmächtigen Verbesserung der Situation befähigt.

Data virutalization is an emergent technology for implementing data-driven business intelligence solutions. With new technologies come new challenges, the complex security and data models within business data applications require sophisticated methods for efficient, scalable and accurate information retrieval via full text search. The challenge we faced was to find a solution for all required steps from bringing data into an index of a search engine to data retrieval afterwards, without enabling the users to bypass the security policy of the company and thus preserve confidentiality. We researched state-of-the-art solutions for similar problems and elaborated different concepts for security enforcement. We also implemented a prototype as a proof-of-work, provided suggestions for follow-up implementations and guidelines on how the faced problems may be solved. Finally, we discussed our proposed solution and examined the drawbacks and benefits arising from our chosen way. We figured out, that a Late Binding approach for access control within the index delivers a fully generic, zero-stale solution that, as we show in the evaluation, is sufficient for a small set of documents with high average visibility density. However, to facilitate scalability, our proposed solution incorporates both, early binding as pre-filtering as well as late binding for post-filtering.

Decision trees are one of the most intuitive models for decision making used in machine learning. However, the greedy nature of state of the art decision tree building algorithms can lead to subpar results. This thesis aimed to use the non- greedy nature of reinforcement learning to overcome this limitation. The novel approach of using reinforcement learning to grow decision trees for classification tasks resulted in a new algorithm that is competitive with state of the art methods and is able to produce optimal trees for simple problems requiring a non-greedy solution. We argue that it is well suited for data exploration purposes due to diverse results and direct influence on the trade-off between tree size and performance.

Whether it is a posting spreading hate about a group of people, a comment insulting another person or a status containing obscenities, such types of toxic content have become a common issue for many online platforms. Owners of platforms like blogs, forums or social networks are highly interested in detecting this negative content. The goal of this thesis is to evaluate the general suitability of convolutional neural networks (CNNs) for classifying toxicity in textual online comments. For this pur- pose different CNN architectures are developed and their performance is compared to state-of-the-art methods on the data set containing comments from Wikipedia discussion pages. For a better understanding of this type of neural networks this thesis contains three subquestions: a) Which patterns do CNNs learn and which features are important for the classification when being applied to this task? b) Which preprocessing techniques are beneficial to the performance? c) Are CNNs well-suited for comments from sources other than Wikipedia discussion pages? The evaluation showed a performance similar to other classifiers on the same data set. Moreover, the model showed a comparable performance on a second data set created for this thesis. The best single preprocessing technique in this work improved the F1 score from 0.636 to 0.645 compared to the baseline. An analysis of a trained model revealed that some patterns detected by the convolutional layer are interpretable by humans. The analysis of the influence of words to the prediction highlighted struggles with negations in the text and also revealed a severe bias included in the model.

In order to meet the current trends and challenges in the industrial sector, production logistics is one of the focal points in the optimization of assembly systems. In order to increase the efficiency of the internal material supply, milk-run systems were introduced. The milk-run is responsible for the replenishment and transport of parts from the warehouse to the workplaces within a company and is part of the intralogistics system. The aim of this thesis was to digitize such a milk-run system with the help of an RFID system and to test it afterwards. In the course of this digitization a software was developed, which simulates the complete production and logistics process of an assembly line. In order to be able to test this simulation, a suitable institution had to be found where the digitized milk-run system could be implemented and tested in order to generate a meaningful comparative value for the simulator. With the IIM LEAD Factory a suitable learning factory was found in which it was possible to implement the digitized milk-run system. The digitized milk-run system consists of an order management sub-system, which gives the logistics employee an overview of open orders and suggests to him or her where the parts to be picked are located on the shelf. The picking process is completed in connection with a pick-to-light system, that visually shows the employee exactly the compartment in the warehouse that is needed for the active order. In addition, the digitized milk-run system was enhanced by a route calculation, which allows to find the most suitable path from the warehouse to the workplace. One of the tasks of the already mentioned simulator is to simulate real production in such a way that it is possible to make suggestions to the employee for orders that would ideally have been placed in the near future. In order to evaluate that these simulated orders are correct, it was important to compare them with real orders from the learning factory. The result was not only a fully functional digitized milk-run system, but also an evaluation of how well the digitized system works in comparison to the old system and how precise the results of the simulator are. With the completion of this project it is possible to have a digitized milk-run system available, which has been tested and evaluated in a university institution.

Transport mode detection (TMD) is the process of recognizing the means of transportation (such as walking, cycling, driving, taking a bus, riding a metro etc.) by a given sensory input. When this input consists exclusively of audio data then it is called acoustic TMD. This thesis recherches and presents the mythology for creating datasets, which fulfill all critical requirements for the highly complex task of acoustic TMD. It provides a step-by-step guideline on what needs to be considered when designing, producing and enhancing the dataset. In order to compile this guideline a recording application was developed, a 9-class dataset with 245 hours of recordings was created, and experiments were run using this dataset. Those experiments aimed to shed light onto the required number and diversity of recordings, the ideal number of total classes, what is an appropriate sample length, how to remove samples of low quality and which evaluation strategy should be used. Finally, existing external datasets were used to evaluate the classification capabilities. With the help of our findings it should be easier for future projects to create their own acoustic datasets, especially for TMD.

The Portable Document Format, also called PDF, plays an important role in industry, academics and personal life. The purpose of this file format is to exchange documents in a platform independent manner. The PDF standard includes a standardized way to add annotations to a document, enabling users to highlight text, add notes and add images. However, those annotations are meant be added manually in a PDF reader application, resulting in tedious manual work for large documents. The aim of this bachelor thesis was to create an application that enabled users to annotate PDF documents in a semi-automatic way. First, users could add annotations manually. Then, the application provided functionality to repeat the annotation automatically based on certain rules. For instance, annotations could be repeated on all, even or odd pages. Additionally, annotations can be repeated based on font and font size. The application was built using modern web technologies, such as HTML5 DOM elements, front-end web frameworks, REST APIs and Node.js. The system compon- ent responsible for automatic annotation repetition was implemented as a separate service, resulting in a small-scale microservice architecture. Evaluation showed that the application fulfills all use cases that were specified be- forehand. However, it also showed that there were some major problems regarding usability and discoverability. Furthermore, performance tests showed that in some browsers, memory consumption can be an issue when handling large documents.

Efficient siting of public charging infrastructure is critical for a seminal economic success in the expansion and utilization of electromobility. The research questions posed by this thesis read firstly: what are key criteria for the siting of charging points (CP) at the present day and secondly what characterizes optimal locations for future charging stations (CS) in Austria and Germany? To answer the research questions, a literature review was conducted to understand existing approaches to siting charging infrastructure and identify tools and practices already in use. Secondly, nine expert interviews were held with planners, operators and promoters of charging infrastructure from Germany and Austria. How existing companies and official authorities plan and develop charging infrastructure is currently subject of scientific research. Various approaches and models exist. However, they still require empirical and practical validation. The target of the thesis is to ascertain if there is a predefined procedure existent for the positioning of future charging infrastructure in the public space, as well as to examine which quality criteria are the most important to site both profitable and customer-oriented charging infrastructure in the future. To accomplish that, results from the interviews are contrasted with current literature. Findings show that there is no predefined procedure existent for the positioning of charging infrastructure. However, there are criteria that are of particular relevance for an efficient positioning. The aspects that are considered by both, literature and experts, to be most relevant in finding the right location of future charging infrastructure for EV are: points of interest nearby, participation of society (demand-based positioning) and use case (normal vs. fast charge) orientation. Once a CP is setup, there are three key parameters that define a profitable CP. These are high workload, high fluctuation and high energy turnover.

In most companies business management software has become omnipresent in recent years. These systems have been introduced to streamline productivity and handle data in a more centralized fashion. While younger staff, who grew up with computers and smart-phones, navigate newly introduced IT-services with ease, it can be challenging for more mature employees to understand and efficiently use those systems. To increase the efficiency in usage, we propose the introduction of a chatbot to assist users in performing complex tasks. Users can achieve their goals by writing to the conversational system messages in natural language. In further work, we focus on the German language to deploy the chatbot to a mid-sized Austrian company. To build a meaningful and helpful chatbot, we first elaborate on the back- grounds of customer-relationship management (CRM) software, the general structure of conversations and relating work regarding chatbots. With this information in mind, we outline useful features a chatbot for a German CRM software should exhibit. We evaluate existing Natural Language Processing (NLP) components for German and choose to implement a hybrid approach consisting of machine learning for intent classification and rule-based methods in a frame-based approach. After an evaluation period, we conducted a technical and empirical evalu- ation. For the empirical evaluation questionnaires were sent out to collect seven metrics. A major finding was, while this system was text-based only, users wished for voice-based interaction, to use the otherwise dead time when driving to and from the customer. The empirical evaluation also found users preferring a more rigid syntax over natural text. This reduced ambiguity for the chatbot and therefore improves on conversation efficiency.

Semiconductor manufacturing is a highly complex and competitive branch of industry, comprising hundreds of process steps, which do not allow any deviations from the specification. Depending on the application area of the products, the production chain is subject to strict quality require- ments. While heading towards industry 4.0, automation of production workflows is required and hence, even more effort must be spent on controlling the processes accordingly. The need for data-driven indicators supporting human experts via monitoring the production process is inevitable, but lacks adequate solutions exploiting both, profound academic methodologies and domain-specific know-how. In many cases, process deviations cannot be detected automatically during the semiconductor frontend production. Hence, the wafer test stage at the end of frontend manufacturing plays a key role to determine whether preceding process steps were executed with the necessary precision. The analysis of these wafer test data is challenging, since process deviations can only be detected by investigating spatial dependencies (patterns) over the wafer. Such patterns become visible, if devices on the wafer violate specification limits of the product. In this work, we go one step further and investigate the automated detection of process patterns in data from analog wafer test parameters, i.e. the electrical measurements, instead of pass/fail classifications, which brings the benefit that deviations can be recognized before they result in yield loss - this aspect is a clear difference to state-of-the-art research, where merely specification violations are observed. For this purpose, an indicator for the level of concern associated with process patterns on the wafer, a so-called Health Factor for Process Patterns, is presented. The indicator combines machine learning techniques and expert knowledge. In order to develop such a Health Factor, the problem is divided into three major components, which are investigated separately: recognition of the pattern type, quantification of the intensity of a pattern and specification of the criticality associated with each pattern type. Since the first two components are intrinsically present in the wafer test data, machine learning systems are deployed for both, while criticality is specified by introducing expert and domain knowledge to the concept. The proposed decision support system is semi-automated and thus, unifies pattern recognition and expert knowledge in a promising way. The effectiveness of the proposed Health Factor is underlined by experiments conducted on simulated as well as real-world datasets. The evaluations show that the system is not only mathematically valid, but also practically applicable and fulfills the demands raised by a real- world production environment. Moreover, the indicator can be transferred to various product types or even related problem setups given a reliable training dataset.

Nowadays there are more and more devices that are being connected to the internet, therefore it is important to provide a reliable bridge between them. Gathering/Routing the data is the foundation for many different business processes and is therefore highly important. The goal of this thesis was to build a scalable infrastructure for sensor data that only uses open source components and is easy to use for users who provide sensor data. To make this system scalable, different container orchestrators were evaluated. As a basis, the container orchestration tool Kubernetes was chosen. Addi- tional system components for system maintenance were selected to improve the maintainability. Further components include a load-balancer, certificates for secure communication and monitoring. For the persistence of data, a solution was evaluated and included. The platform can be deployed to different IaaS providers via a Terraform script. The web UI for users and application management is written in Java and based on the high performance web framework Vert.x. The performance was evaluated using current web frameworks as a reference point. Applications from categories such as data input, data output and data computation/pro- cessing can be consumed by users. For every application category there is at least one reference application configured. On the data input category available MQTT servers were tested in regards to performance and the best suitable server solution was selected. The data output layer was evaluated and the best databases were used. For the data computation layer a HSTM based computational intelligence library was selected to showcase inter-connectivity between the components. The framework is extensible to include new applications to provide additional functionality to the users of the system. The system was tested in full action with two sensor types for input and out- put. Additional hardware sensors can be included by providing a template and base-values. Code can then be uploaded to these sensors, based on the values the user provided. Thus the developed system allows and facilitates the setup of a full-blown scalable sensor data framework on multiple cloud provider.

The problem of information overload is widely recognized today. Living in an information society, we are all affected by the increasing amounts of information becoming available every day. The impact of this phenomenon shows itself in several information related tasks, such as conducting a litera- ture search, by making it difficult for people to find information relevant to their interests. In this work, we develop a recommender system capable of providing relevant literature recommendations for a pending citation in a scientific paper. We employ a content-based recommendation approach based on information retrieval techniques. The input to our system con- sists of the citation context around the pending citation while the output comprises a ranked list of documents serving as citation candidates. Within our experimental setup, we experiment with different query formulation strategies and retrieval models in order to improve the performance of the system. The evaluation of our system shows the potential of this approach, reaching a peak MRR of 0.416. This is further emphasized by the results gained from our contribution to the CL-SciSumm Shared Task 2017 where we achieve top results among all participating systems.

As monolithic applications are becoming rarer a new problem occurs how these smaller applications are communicating with each other it becomes especially significant when looking into the topic of reporting which usually requires data from multiple sources together. We introduce Kafka as a distributed messaging system into our environment as a means of inter-service communication. Additionally, two ways of storing data are provided. MySQL for structured data and MongoDB for unstructured data. The system is then evaluated in several categories. It will be tested in terms of resiliency, performance tests with a high number of messages and an increasing size of individual messages. The blockages of this system will be assessed if this system is useful for reporting data to customers. The experiments indicate that this system circumvents many problems in a monolithic infrastructure. Nevertheless, it creates a performance bottleneck when storing data received from Kakfa. Storing structured data turned out to be way more problematic than unstructured data by a magnitude. Despite this, we have been using a distributed messaging setup in production for some years now and are also using this for reports with structured data. Storing unstructured data in this new setup has not made it to production yet which we are currently working on.