Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2010

Koerner C., Kern Roman, Grahsl H. P., Strohmaier M.

Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

21st ACM SIGWEB Conference on Hypertext and Hypermedia (HT2010), 2010

Konferenz
2010

Koerner C., Kern Roman, Strohmaier M.

Why do Users Tag? Detecting Users' Motivation for Tagging in Social Tagging Systems

4th International AAAI Conference on Weblogs and Social Media (ICWSM2010), 2010

Konferenz
2010

Kern Roman, Granitzer Michael, Muhr M.

Analysis of Structural Relationships for Hierarchical Cluster Labeling

Proceeding of the 33rd international ACM SIGIR Conference on Research and Development in information Retrieval, ACM, 2010

Konferenz
Cluster label quality is crucial for browsing topic hierarchiesobtained via document clustering. Intuitively, the hierarchicalstructure should influence the labeling accuracy. However,most labeling algorithms ignore such structural propertiesand therefore, the impact of hierarchical structureson the labeling accuracy is yet unclear. In our work weintegrate hierarchical information, i.e. sibling and parentchildrelations, in the cluster labeling process. We adaptstandard labeling approaches, namely Maximum Term Frequency,Jensen-Shannon Divergence, χ2 Test, and InformationGain, to take use of those relationships and evaluatetheir impact on 4 different datasets, namely the Open DirectoryProject, Wikipedia, TREC Ohsumed and the CLEFIP European Patent dataset. We show, that hierarchicalrelationships can be exploited to increase labeling accuracyespecially on high-level nodes.
2010

Kern Roman, Koerner C., Strohmaier M.

Exploring the Influence of Tagging Motivation on Tagging Behavior

European Conference on Research and Advanced Technology for Digital Libraries , 2010

Konferenz
2010

Kern Roman, Granitzer Michael, Muhr M.

KCDC: Word Sense Induction by Using Grammatical Dependencies and Sentence Phrase Structure

Proceedings of SemEval-2, 2010

Konferenz
Word sense induction and discrimination(WSID) identifies the senses of an ambiguousword and assigns instances of thisword to one of these senses. We have builda WSID system that exploits syntactic andsemantic features based on the results ofa natural language parser component. Toachieve high robustness and good generalizationcapabilities, we designed our systemto work on a restricted, but grammaticallyrich set of features. Based on theresults of the evaluations our system providesa promising performance and robustness.
2010

Kern Roman, Granitzer Michael

German Encyclopedia Alignment Based on Information Retrieval Techniques

ECDL 2010: Research and Advanced Technology for Digital Libraries, 2010

Konferenz
Collaboratively created online encyclopedias have becomeincreasingly popular. Especially in terms of completeness they have begunto surpass their printed counterparts. Two German publishers oftraditional encyclopedias have reacted to this challenge and decided tomerge their corpora to create a single more complete encyclopedia. Thecrucial step in this merge process is the alignment of articles. We havedeveloped a system to identify corresponding entries from different encyclopediccorpora. The base of our system is the alignment algorithmwhich incorporates various techniques developed in the field of informationretrieval. We have evaluated the system on four real-world encyclopediaswith a ground truth provided by domain experts. A combinationof weighting and ranking techniques has been found to deliver a satisfyingperformance.
2010

Kern Roman, Zechner Mario, Granitzer Michael, Muhr M.

External and Intrinsic Plagiarism Detection using a Cross-Lingual Retrieval and Segmentation System Lab Report for PAN at CLEF 2010

2nd International Competition on Plagiarism Detection, 2010

Konferenz
We present our hybrid system for the PAN challenge at CLEF 2010.Our system performs plagiarism detection for translated and non-translated externallyas well as intrinsically plagiarized document passages. Our external plagiarismdetection approach is formulated as an information retrieval problem, usingheuristic post processing to arrive at the final detection results. For the retrievalstep, source documents are split into overlapping blocks which are indexed via aLucene instance. Suspicious documents are similarly split into consecutive overlappingboolean queries which are performed on the Lucene index to retrieve aninitial set of potentially plagiarized passages. For performance reasons queriesmight get rejected via a heuristic before actually being executed. Candidate hitsgathered via the retrieval step are further post-processed by performing sequenceanalysis on the passages retrieved from the index with respect to the passagesused for querying the index. By applying several merge heuristics bigger blocksare formed from matching sequences. German and Spanish source documentsare first translated using word alignment on the Europarl corpus before enteringthe above detection process. For each word in a translated document severaltranslations are produced. Intrinsic plagiarism detection is done by finding majorchanges in style measured via word suffixes after the documents have been partitionedby an linear text segmentation algorithm. Our approach lead us to the thirdoverall rank with an overall score of 0.6948.
2010

Kern Roman, Seifert Christin, Granitzer Michael

A Hybrid System for German Encyclopedia Alignment

International Journal on Digital Libraries, Springer, 2010

Journal
Collaboratively created on-line encyclopediashave become increasingly popular. Especially in terms ofcompleteness they have begun to surpass their printedcounterparts. Two German publishers of traditional encyclopediashave reacted to this challenge and started aninitiative to merge their corpora to create a single, more completeencyclopedia. The crucial step in this merging processis the alignment of articles. We have developed a two-stephybrid system to provide high-accurate alignments with lowmanual effort. First, we apply an information retrieval based,automatic alignment algorithm. Second, the articles with alow confidence score are revised using a manual alignmentscheme carefully designed for quality assurance. Our evaluationshows that a combination of weighting and rankingtechniques utilizing different facets of the encyclopedia articlesallow to effectively reduce the number of necessary manualalignments. Further, the setup of the manual alignment turned out to be robust against inter-indexer inconsistencies.As a result, the developed system empowered us to align fourencyclopedias with high accuracy and low effort.
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close