Knowledge Discovery
discovery

Knowledge Discovery –
The exploitation of knowledge

Portrait of Roman Kern Kontakt send
mail
Roman Kern Area Head
Portrait of Matthias Heise Kontakt send
mail
Matthias Heise Deputy Area Head
Contact us

Do you ever feel like a treasure hunter searching for hidden information and necessary knowledge? Indeed, performing research activities and dealing with big amounts of data consumes a lot of our working time. In the area "Knowledge Discovery" we develop automated methods for the analyzing, enrichment and linking of complex data sources.

We never run out of ideas!
Methods and Applications

Information Retrieval

In the information retrieval (search technologies) field, we facilitate search in the enterprise environment for our customers and tackle challenges such as multilinguality and synonyms. Challenges particular to the area of enterprise search are addressed, e.g., restricting the search results only to those documents to which the respective employee has access rights. Our solutions are based on technologies that do not require the recording user interactions.

Machine Learning

The second field of research is dedicated to the categorization, grouping and sorting of data. For example, machine learning algorithms can be applied to assign documents to various categories, a process, which can be additionally be steered by supplying patterns. These categories can either be defined in advance or automatically calculated from the data itself. Related approaches allow an automated or guided tagging of documents and related meta-data. Our technologies are not limited to textual resources alone, they can be applied to a wide range of data, e.g., sensor data or time series.

Natural Language Processing

In the field of Natural Language Processing, we extract information from unstructured natural language data – ranging from simple information such as person names and locations to more complex technical terms, specific to certain domains. In order to provide satisfactory quality in these methods, it is often necessary to support the algorithms with representative training data. To that end, we develop tools to create such training data and to alleviate and automate this process as much as possible. Furthermore, we also deal with the extraction of unstructured information, in particular the analysis of PDF documents, which are prevalent in the enterprise environment and can further be processed using our methods.

Big Data

The above-mentioned methods in the three fields can be applied to small sets of documents, but also scale gracefully to big data thereby making use of a distributed environment.

Projects

  • Hyperwave

    The ongoing project with our business partner Hyperwave is dedicated to develop an enterprise search solution for content management systems. Many search technologies developed by us are thereby being integrated into the existing solution of the partner organization. Ultimately, all customers of Hyperwave benefit from this by receiving a content management system featuring a search solution tailored towards their respective domain.

  • Lexis-Nexis

    For Lexis-Nexis we developed a solution for the automatic assignment and grouping of documents in various legal areas.
    In all areas of our work, one concept is especially important: Big Data. All of the above mentioned methods can be applied both to small as well as massive numbers of documents that can be -analyzed by the use of multiple, distributed machines. Over the last years, the Knowledge Discovery team has acquired the necessary engineering skills and has developed methods for dealing with constantly growing data sets.

  • Mendeley

    An important part of our work is to provide our project partners with technologies that promote innovation. An example of it is our common project with Mendeley, a provider of software for managing scientific publications. Within this project, existing approaches for extracting information from unstructured documents were further developed and new methods created, e.g., automated table extraction and reconstruction of the table of contents. We achieved the state of the art and beyond in the recognition of entities in the domains of biomedicine and computer science. These techniques will ultimately help Mendeley users, for example to speed up the navigation within publications.

Contact

Wenn Sie diese Seite nutzen stimmen Sie der Verwendung von Cookies zu mehr Information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close