Kern Roman, Zechner Mario, Granitzer Michael
2011
Author disambiguation is a prerequisite for utilizingbibliographic metadata in citation analysis. Automaticdisambiguation algorithms mostly rely on cluster-based disambiguationstrategies for identifying unique authors given theirnames and publications. However, most approaches rely onknowing the correct number of unique authors a-priori, whichis rarely the case in real world settings. In this publicationwe analyse cluster-based disambiguation strategies and developa model selection method to estimate the number of distinctauthors based on co-authorship networks. We show that, givenclean textual features, the developed model selection methodprovides accurate guesses of the number of unique authors.