Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2016

Pimas Oliver, Klampfl Stefan, Kohl Thomas, Kern Roman, Kröll Mark

Generating Tailored Classification Schemas for German Patents

21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Springer-Verlag, Salford, UK, 2016

Konferenz
Patents and patent applications are important parts of acompany’s intellectual property. Thus, companies put a lot of effort indesigning and maintaining an internal structure for organizing their ownpatent portfolios, but also in keeping track of competitor’s patent port-folios. Yet, official classification schemas offered by patent offices (i) areoften too coarse and (ii) are not mappable, for instance, to a company’sfunctions, applications, or divisions. In this work, we present a first steptowards generating tailored classification. To automate the generationprocess, we apply key term extraction and topic modelling algorithmsto 2.131 publications of German patent applications. To infer categories,we apply topic modelling to the patent collection. We evaluate the map-ping of the topics found via the Latent Dirichlet Allocation method tothe classes present in the patent collection as assigned by the domainexpert.
2016

Klampfl Stefan, Kern Roman

Reconstructing the Logical Structure of a Scientific Publication using Machine Learning

Semantic Web Challenges, Communications in Computer and Information Science, Springer Link, Springer-Verlag, 2016

Konferenz
Semantic enrichment of scientific publications has an increasing impact on scholarly communication. This document describes our contribution to Semantic Publishing Challenge 2016, which aims at investigating novel approaches for improving scholarly publishing through semantic technologies. We participated in Task 2 of this challenge, which requires the extraction of information from the content of a paper given as PDF. The extracted information allows answering queries about the paper’s internal organisation and the context in which it was written. We build upon our contribution to the previous edition of the challenge, where we categorised meta-data, such as authors and affiliations, and extracted funding information. Here we use unsupervised machine learning techniques in order to extend the analysis of the logical structure of the document as to identify section titles and captions of figures and tables. Furthermore, we employ clustering techniques to create the hierarchical table of contents of the article. Our system is modular in nature and allows a separate training of different stages on different training sets.
2016

Kern Roman, Klampfl Stefan, Rexha Andi

Identifying Referenced Text in ScientificPublications by Summarisation andClassification Techniques

BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries, G. Cabanac, Muthu Kumar Chandrasekaran, Ingo Frommholz , Kokil Jaidka, Min-Yen Kan, Philipp Mayr, Dietmar Wolfram, ACM, New Jersey, USA, 2016

Konferenz
This report describes our contribution to the 2nd ComputationalLinguistics Scientific Document Summarization Shared Task (CLSciSumm2016), which asked to identify the relevant text span in a referencepaper that corresponds to a citation in another document that citesthis paper. We developed three different approaches based on summarisationand classification techniques. First, we applied a modified versionof an unsupervised summarisation technique, TextSentenceRank, to thereference document, which incorporates the similarity of sentences tothe citation on a textual level. Second, we employed classification to selectfrom candidates previously extracted through the original TextSentenceRankalgorithm. Third, we used unsupervised summarisation of therelevant sub-part of the document that was previously selected in a supervisedmanner.
2016

Rexha Andi, Klampfl Stefan, Kröll Mark, Kern Roman

Towards a more fine grained analysis of scientific authorship: Predicting the number of authors using stylometric features

BIR 2016 Workshop on Bibliometric-enhanced Information Retrieval, Atanassova, I.; Bertin, M.; Mayr, P., Springer, Padova, Italy, 2016

Konferenz
To bring bibliometrics and information retrieval closer together, we propose to add the concept of author attribution into the pre-processing of scientific publications. Presently, common bibliographic metrics often attribute the entire article to all the authors affecting author-specific retrieval processes. We envision a more finegrained analysis of scientific authorship by attributing particular segments to authors. To realize this vision, we propose a new feature representation of scientific publications that captures the distribution of tylometric features. In a classification setting, we then seek to predict the number of authors of a scientific article. We evaluate our approach on a data set of ~ 6100 PubMed articles and achieve best results by applying random forests, i.e., 0.76 precision and 0.76 recall averaged over all classes.
2015

Klampfl Stefan, Kern Roman

Machine Learning Techniques for Automatically Extracting Contextual Information from Scientific Publications

Semantic Web Evaluation Challenges. SemWebEval 2015 at ESWC 2015, Portorož, Slovenia, May 31 – June 4, 2015, Revised Selected Papers, Gandon, F.; Cabrio, E.; Stankovic, M.; Zimmermann, A. , Springer International Publishing, 2015

Konferenz
Scholarly publishing increasingly requires automated systems that semantically enrich documents in order to support management and quality assessment of scientific output.However, contextual information, such as the authors' affiliations, references, and funding agencies, is typically hidden within PDF files.To access this information we have developed a processing pipeline that analyses the structure of a PDF document incorporating a diverse set of machine learning techniques.First, unsupervised learning is used to extract contiguous text blocks from the raw character stream as the basic logical units of the article.Next, supervised learning is employed to classify blocks into different meta-data categories, including authors and affiliations.Then, a set of heuristics are applied to detect the reference section at the end of the paper and segment it into individual reference strings.Sequence classification is then utilised to categorise the tokens of individual references to obtain information such as the journal and the year of the reference.Finally, we make use of named entity recognition techniques to extract references to research grants, funding agencies, and EU projects.Our system is modular in nature.Some parts rely on models learnt on training data, and the overall performance scales with the quality of these data sets.
2015

Rexha Andi, Klampfl Stefan, Kröll Mark, Kern Roman

Towards Authorship Attribution for Bibliometrics using Stylometric Features

Proc. of the Workshop Mining Scientific Papers: Computational Linguistics and Bibliometrics, Atanassova, I.; Bertin, M.; Mayr, P., ACL Anthology, Istanbul, Turkey, 2015

Konferenz
The overwhelming majority of scientific publications are authored by multiple persons; yet, bibliographic metrics are only assigned to individual articles as single entities. In this paper, we aim at a more fine-grained analysis of scientific authorship. We therefore adapt a text segmentation algorithm to identify potential author changes within the main text of a scientific article, which we obtain by using existing PDF extraction techniques. To capture stylistic changes in the text, we employ a number of stylometric features. We evaluate our approach on a small subset of PubMed articles consisting of an approximately equal number of research articles written by a varying number of authors. Our results indicate that the more authors an article has the more potential author changes are identified. These results can be considered as an initial step towards a more detailed analysis of scientific authorship, thereby extending the repertoire of bibliometrics.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close