In order to provide accurate statistics and information on how much work was published by institutes and researchers, Graz University of Technology uses a com- mercial research management system called PURE. The university would like to have all work which was published by its institutes and researchers registered to this system. However, registering older publications to this system is a daunting task be- cause missing meta-information has to be entered manually. The project behind this thesis was to develop an application which makes the import of meta-information provided by other research portals into this system easier. This problem had to be tackled by the development of smart algorithms to infer missing meta-information, and an user-interface which supports the definition of default values for informa- tion where no inference is possible. Those tasks involved working with public and private API’s, parsing and generating large XML-files and the implementation of an architecture which supports multiple different sources for meta-information on publications. The development of this application was successful and the generation of XML for a bulk import of meta-information from another research portal called DBLP is now possible. The application is easily extensible in respect to the addition of other research portals and provides versatile settings to adjust the generation of import-XML more specifically. Users with administrative access to the PURE server of the university can now select publications from supported research portals and generate large XML-files for a bulk import of meta-information. Only a long- term field test of this application will show whether or not the problem has been completely solved by this work.

In automatised warehouses often unwanted situations, which are called problems, occur. In this bachelor’s thesis, a system component which col- lects information about these problems and offers solutions to overcome these was developed. This component was integrated into an existing ware- house management system. Out of ten common problematic scenarios, 26 requirements which define functional and non-functional attributes of the desired system component have been worked out. From process details like recognition of problems, the definition of problems and their solutions and handling of these by users are covered in this thesis. Then, a chosen set of demands was implemented in a proof-of-concept solution. Additionally, the introduced scenarios were implemented in a demonstration warehouse. In the provided framework, the implemented scenarios can be observed and handled by users. Handling problems is more than 68 per cent faster using this framework. Even though adding new problems to handle is not simple and the calculations made are very time-consuming, this thesis offers a big first step from a user-guided system to a system-guided user.

Data virutalization is an emergent technology for implementing data-driven business intelligence solutions. With new technologies come new challenges, the complex security and data models within business data applications require sophisticated methods for efficient, scalable and accurate information retrieval via full text search. The challenge we faced was to find a solution for all required steps from bringing data into an index of a search engine to data retrieval afterwards, without enabling the users to bypass the security policy of the company and thus preserve confidentiality. We researched state-of-the-art solutions for similar problems and elaborated different concepts for security enforcement. We also implemented a prototype as a proof-of-work, provided suggestions for follow-up implementations and guidelines on how the faced problems may be solved. Finally, we discussed our proposed solution and examined the drawbacks and benefits arising from our chosen way. We figured out, that a Late Binding approach for access control within the index delivers a fully generic, zero-stale solution that, as we show in the evaluation, is sufficient for a small set of documents with high average visibility density. However, to facilitate scalability, our proposed solution incorporates both, early binding as pre-filtering as well as late binding for post-filtering.

The Portable Document Format, also called PDF, plays an important role in industry, academics and personal life. The purpose of this file format is to exchange documents in a platform independent manner. The PDF standard includes a standardized way to add annotations to a document, enabling users to highlight text, add notes and add images. However, those annotations are meant be added manually in a PDF reader application, resulting in tedious manual work for large documents. The aim of this bachelor thesis was to create an application that enabled users to annotate PDF documents in a semi-automatic way. First, users could add annotations manually. Then, the application provided functionality to repeat the annotation automatically based on certain rules. For instance, annotations could be repeated on all, even or odd pages. Additionally, annotations can be repeated based on font and font size. The application was built using modern web technologies, such as HTML5 DOM elements, front-end web frameworks, REST APIs and Node.js. The system compon- ent responsible for automatic annotation repetition was implemented as a separate service, resulting in a small-scale microservice architecture. Evaluation showed that the application fulfills all use cases that were specified be- forehand. However, it also showed that there were some major problems regarding usability and discoverability. Furthermore, performance tests showed that in some browsers, memory consumption can be an issue when handling large documents.

As monolithic applications are becoming rarer a new problem occurs how these smaller applications are communicating with each other it becomes especially significant when looking into the topic of reporting which usually requires data from multiple sources together. We introduce Kafka as a distributed messaging system into our environment as a means of inter-service communication. Additionally, two ways of storing data are provided. MySQL for structured data and MongoDB for unstructured data. The system is then evaluated in several categories. It will be tested in terms of resiliency, performance tests with a high number of messages and an increasing size of individual messages. The blockages of this system will be assessed if this system is useful for reporting data to customers. The experiments indicate that this system circumvents many problems in a monolithic infrastructure. Nevertheless, it creates a performance bottleneck when storing data received from Kakfa. Storing structured data turned out to be way more problematic than unstructured data by a magnitude. Despite this, we have been using a distributed messaging setup in production for some years now and are also using this for reports with structured data. Storing unstructured data in this new setup has not made it to production yet which we are currently working on.