Stanisavljevic Darko, Hasani-Mavriqi Ilire, Lex Elisabeth, Strohmaier M., Helic Denis
2016
In this paper we assess the semantic stability of Wikipedia by investigat-ing the dynamics of Wikipedia articles’ revisions over time. In a semantically stablesystem, articles are infrequently edited, whereas in unstable systems, article contentchanges more frequently. In other words, in a stable system, the Wikipedia com-munity has reached consensus on the majority of articles. In our work, we measuresemantic stability using the Rank Biased Overlap method. To that end, we prepro-cess Wikipedia dumps to obtain a sequence of plain-text article revisions, whereaseach revision is represented as a TF-IDF vector. To measure the similarity betweenconsequent article revisions, we calculate Rank Biased Overlap on subsequent termvectors. We evaluate our approach on 10 Wikipedia language editions includingthe five largest language editions as well as five randomly selected small languageeditions. Our experimental results reveal that even in policy driven collaborationnetworks such as Wikipedia, semantic stability can be achieved. However, there aredifferences on the velocity of the semantic stability process between small and largeWikipedia editions. Small editions exhibit faster and higher semantic stability than large ones. In particular, in large Wikipedia editions, a higher number of successiverevisions is needed in order to reach a certain semantic stability level, whereas, insmall Wikipedia editions, the number of needed successive revisions is much lowerfor the same level of semantic stability.