Kern Roman, Zechner Mario, Granitzer Michael
2011
Model Selection Strategies for Author Disambiguation
IEEE Computer Society: 8th International Workshop on Text-based Information Retrieval in Procceedings of 22th International Conference on Database and Expert Systems Applications (DEXA 11) IEEE
Author disambiguation is a prerequisite for utilizingbibliographic metadata in citation analysis. Automaticdisambiguation algorithms mostly rely on cluster-based disambiguationstrategies for identifying unique authors given theirnames and publications. However, most approaches rely onknowing the correct number of unique authors a-priori, whichis rarely the case in real world settings. In this publicationwe analyse cluster-based disambiguation strategies and developa model selection method to estimate the number of distinctauthors based on co-authorship networks. We show that, givenclean textual features, the developed model selection methodprovides accurate guesses of the number of unique authors.