Granitzer Michael, Rath Andreas S., Kröll Mark, Ipsmiller D., Devaurs Didier, Weber Nicolas, Lindstaedt Stefanie , Seifert C.
2009
Increasing the productivity of a knowledgeworker via intelligent applications requires the identification ofa user’s current work task, i.e. the current work context a userresides in. In this work we present and evaluate machine learningbased work task detection methods. By viewing a work taskas sequence of digital interaction patterns of mouse clicks andkey strokes, we present (i) a methodology for recording thoseuser interactions and (ii) an in-depth analysis of supervised classificationmodels for classifying work tasks in two different scenarios:a task centric scenario and a user centric scenario. Weanalyze different supervised classification models, feature typesand feature selection methods on a laboratory as well as a realworld data set. Results show satisfiable accuracy and high useracceptance by using relatively simple types of features.
Lex Elisabeth, Granitzer Michael, Juffinger A., Seifert C.
2009
Cross-Domain Classification: Trade-Off between Complexity and Accuracy
Proceedings of the 4th International Conference for Internet Technology and Secured Transactions (ICITST) 2009
Text classification is one of the core applications in data
mining due to the huge amount of not categorized digital
data available. Training a text classifier generates a model
that reflects the characteristics of the domain. However, if
no training data is available, labeled data from a related
but different domain might be exploited to perform crossdomain
classification. In our work, we aim to accurately
classify unlabeled blogs into commonly agreed newspaper
categories using labeled data from the news domain. The
labeled news and the unlabeled blog corpus are highly dynamic
and hourly growing with a topic drift, so a trade-off
between accuracy and performance is required. Our approach
is to apply a fast novel centroid-based algorithm, the
Class-Feature-Centroid Classifier (CFC), to perform efficient
cross-domain classification. Experiments showed that
this algorithm achieves a comparable accuracy than k-NN
and is slightly better than Support Vector Machines (SVM),
yet at linear time cost for training and classification. The
benefit of this approach is that the linear time complexity enables
us to efficiently generate an accurate classifier, reflecting
the topic drift, several times per day on a huge dataset.
Granitzer Michael, Lex Elisabeth, Juffinger A.
2009
Blog Credibility Ranking by Exploiting Verified Content
Proceedings of the 3rd Workshop on Information Credibility on the Web at 18th World Wide Web Conference
People use weblogs to express thoughts, present ideas and
share knowledge. However, weblogs can also be misused to
influence and manipulate the readers. Therefore the credibility
of a blog has to be validated before the available information
is used for analysis. The credibility of a blogentry
is derived from the content, the credibility of the author or
blog itself, respectively, and the external references or trackbacks.
In this work we introduce an additional dimension
to assess the credibility, namely the quantity structure. For
our blog analysis system we derive the credibility therefore
from two dimensions. Firstly, the quantity structure of a set
of blogs and a reference corpus is compared and secondly, we
analyse each separate blog content and examine the similarity
with a verified news corpus. From the content similarity
values we derive a ranking function. Our evaluation showed
that one can sort out incredible blogs by quantity structure
without deeper analysis. Besides, the content based ranking
function sorts the blogs by credibility with high accuracy.
Our blog analysis system is therefore capable of providing
credibility levels per blog.
Neidhart T., Granitzer Michael, Kern Roman, Weichselbraun A., Wohlgenannt G., Scharl A., Juffinger A.
2009