Seifert Christin, Ulbrich Eva Pauline, Granitzer Michael
2011
In text classification the amount and quality of training datais crucial for the performance of the classifier. The generation of trainingdata is done by human labelers - a tedious and time-consuming work. Wepropose to use condensed representations of text documents instead ofthe full-text document to reduce the labeling time for single documents.These condensed representations are key sentences and key phrases andcan be generated in a fully unsupervised way. The key phrases are presentedin a layout similar to a tag cloud. In a user study with 37 participantswe evaluated whether document labeling with these condensedrepresentations can be done faster and equally accurate by the humanlabelers. Our evaluation shows that the users labeled word clouds twiceas fast but as accurately as full-text documents. While further investigationsfor different classification tasks are necessary, this insight couldpotentially reduce costs for the labeling process of text documents.