Kern Roman, Granitzer Michael, Muhr M.
2010
Word sense induction and discrimination(WSID) identifies the senses of an ambiguousword and assigns instances of thisword to one of these senses. We have builda WSID system that exploits syntactic andsemantic features based on the results ofa natural language parser component. Toachieve high robustness and good generalizationcapabilities, we designed our systemto work on a restricted, but grammaticallyrich set of features. Based on theresults of the evaluations our system providesa promising performance and robustness.
Kern Roman, Granitzer Michael, Muhr M.
2010
Cluster label quality is crucial for browsing topic hierarchiesobtained via document clustering. Intuitively, the hierarchicalstructure should influence the labeling accuracy. However,most labeling algorithms ignore such structural propertiesand therefore, the impact of hierarchical structureson the labeling accuracy is yet unclear. In our work weintegrate hierarchical information, i.e. sibling and parentchildrelations, in the cluster labeling process. We adaptstandard labeling approaches, namely Maximum Term Frequency,Jensen-Shannon Divergence, χ2 Test, and InformationGain, to take use of those relationships and evaluatetheir impact on 4 different datasets, namely the Open DirectoryProject, Wikipedia, TREC Ohsumed and the CLEFIP European Patent dataset. We show, that hierarchicalrelationships can be exploited to increase labeling accuracyespecially on high-level nodes.