di Sciascio Maria Cecilia, Strohmaier David, Errecalde Marcelo Luis, Veas Eduardo Enrique
Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia
Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores.