Publikationen

Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen

2010

Lex Elisabeth, Granitzer Michael, Juffinger A.

A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs

IEEE Computer Society: 7th International Workshop on Text-based Information Retrieval in Procceedings of 21th International Conference on Database and Expert Systems Applications (DEXA 10)., IEEE, 2010

Konferenz
In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people’s feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.
2010

Lex Elisabeth, Granitzer Michael, Juffinger A.

Objectivity Classification in Online Media

21st ACM SIGWEB Conference on Hypertext and Hypermedia (HT2010), ACM, 2010

Konferenz
In this work, we assess objectivity in online news media. Wepropose to use topic independent features and we show ina cross-domain experiment that with standard bag-of-wordmodels, classifiers implicitly learn topics. Our experimentsrevealed that our methodology can be applied across differenttopics with consistent classification performance.
2010

Lex Elisabeth, Granitzer Michael, Juffinger A., Muhr M.

Stylometric Features for Emotion Level Classification in News Related Blogs

Proceedings of the 9th ACM RIAO Conference , LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE, 2010

Konferenz
Breaking news and events are often posted in the blogospherebefore they are published by any media agency. Therefore,the blogosphere is a valuable resource for news-relatedblog analysis. However, it is crucial to first sort out newsunrelatedcontent like personal diaries or advertising blogs.Besides, there are different levels of emotionality or involvementwhich bias the news information to a certain extent.In our work, we evaluate topic-independent stylometric featuresto classify blogs into news versus rest and to assess theemotionality in these blogs. We apply several text classifiersto determine the best performing combination of featuresand algorithms. Our experiments revealed that with simplestyle features, blogs can be classified into news versus restand their emotionality can be assessed with accuracy valuesof almost 80%.
2010

Lex Elisabeth, Granitzer Michael, Juffinger A., Seifert C.

Efficient Cross-Domain Classification of Weblogs

International Journal of Intelligent Computing Research (IJICR), Vol.1, Issue 2, Infonomics Society, 2010

Journal
Text classification is one of the core applicationsin data mining due to the huge amount ofuncategorized textual data available. Training a textclassifier results in a classification model that reflectsthe characteristics of the domain it was learned on.However, if no training data is available, labeled datafrom a related but different domain might be exploitedto perform cross-domain classification. In our work,we aim to accurately classify unlabeled weblogs intocommonly agreed upon newspaper categories usinglabeled data from the news domain. The labeled newsand the unlabeled blog corpus are highly dynamicand hourly growing with a topic drift, so theclassification needs to be efficient. Our approach is toapply a fast novel centroid-based text classificationalgorithm, the Class-Feature-Centroid Classifier(CFC), to perform efficient cross-domainclassification. Experiments showed that thisalgorithm achieves a comparable accuracy thank-Nearest Neighbour (k-NN) and Support VectorMachines (SVM), yet at linear time cost for trainingand classification. We investigate the classifierperformance and generalization ability using aspecial visualization of classifiers. The benefit of ourapproach is that the linear time complexity enables usto efficiently generate an accurate classifier,reflecting the topic drift, several times per day on ahuge dataset.
2010

Lex Elisabeth, Granitzer Michael, Juffinger A.

Facet Classification of Blogs: Know-Center at the TREC 2009 Blog Distillation Task

Proceedings of the 18th Text REtrieval Conference, 2010

Konferenz
In this paper, we outline our experiments carried out at the TREC 2009 Blog Distillation Task. Our system is based on a plain text index extracted from the XML feeds of the TREC Blogs08 dataset. This index was used to retrieve candidate blogs for the given topics. The resulting blogs were classified using a Support Vector Machine that was trained on a manually labelled subset of the TREC Blogs08 dataset. Our experiments included three runs on different features: firstly on nouns, secondly on stylometric properties, and thirdly on punctuation statistics. The facet identification based on our approach was successful, although a significant number of candidate blogs were not retrieved at all.
2009

Neidhart T., Granitzer Michael, Kern Roman, Weichselbraun A., Wohlgenannt G., Scharl A., Juffinger A.

Distributed Web2.0 Crawling for Ontology Evolution

Journal of Digital Information Management, 2009

Journal
2009

Lex Elisabeth, Juffinger A.

Crosslanguage Blog Mining and Trend Visualisation

Proceedings of the 18th World Wide Web Conference, 2009

Konferenz
People use weblogs to express thoughts, present ideas and share knowledge, therefore weblogs are extraordinarily valuable resources, amongs others, for trend analysis. Trends are derived from the chronological sequence of blog post count per topic. The comparison with a reference corpus allows qualitative statements over identified trends. We propose a crosslanguage blog mining and trend visualisation system to analyse blogs across languages and topics. The trend visualisation facilitates the identification of trends and the comparison with the reference news article corpus. To prove the correctness of our system we computed the correlation between trends in blogs and news articles for a subset of blogs and topics. The evaluation corroborated our hypothesis of a high correlation coefficient for these subsets and therefore the correctness of our system for different languages and topics is proven.
2009

Granitzer Michael, Lex Elisabeth, Juffinger A.

Blog Credibility Ranking by Exploiting Verified Content

Proceedings of the 3rd Workshop on Information Credibility on the Web at 18th World Wide Web Conference, 2009

Konferenz
People use weblogs to express thoughts, present ideas and share knowledge. However, weblogs can also be misused to influence and manipulate the readers. Therefore the credibility of a blog has to be validated before the available information is used for analysis. The credibility of a blogentry is derived from the content, the credibility of the author or blog itself, respectively, and the external references or trackbacks. In this work we introduce an additional dimension to assess the credibility, namely the quantity structure. For our blog analysis system we derive the credibility therefore from two dimensions. Firstly, the quantity structure of a set of blogs and a reference corpus is compared and secondly, we analyse each separate blog content and examine the similarity with a verified news corpus. From the content similarity values we derive a ranking function. Our evaluation showed that one can sort out incredible blogs by quantity structure without deeper analysis. Besides, the content based ranking function sorts the blogs by credibility with high accuracy. Our blog analysis system is therefore capable of providing credibility levels per blog.
2009

Lex Elisabeth, Granitzer Michael, Juffinger A., Seifert C.

Cross-Domain Classification: Trade-Off between Complexity and Accuracy

Proceedings of the 4th International Conference for Internet Technology and Secured Transactions (ICITST) 2009, 2009

Text classification is one of the core applications in data mining due to the huge amount of not categorized digital data available. Training a text classifier generates a model that reflects the characteristics of the domain. However, if no training data is available, labeled data from a related but different domain might be exploited to perform crossdomain classification. In our work, we aim to accurately classify unlabeled blogs into commonly agreed newspaper categories using labeled data from the news domain. The labeled news and the unlabeled blog corpus are highly dynamic and hourly growing with a topic drift, so a trade-off between accuracy and performance is required. Our approach is to apply a fast novel centroid-based algorithm, the Class-Feature-Centroid Classifier (CFC), to perform efficient cross-domain classification. Experiments showed that this algorithm achieves a comparable accuracy than k-NN and is slightly better than Support Vector Machines (SVM), yet at linear time cost for training and classification. The benefit of this approach is that the linear time complexity enables us to efficiently generate an accurate classifier, reflecting the topic drift, several times per day on a huge dataset.
2009

Willfort R., Lex Elisabeth, Granitzer Michael, Juffinger A.

Spectral Web Content Trend Analysis

Proc. of IADIS International Conference WWW/Internet, 2009

Konferenz
2009

Lex Elisabeth, Granitzer Michael, Juffinger A., Seifert C.

Automated Blog Classification: A Cross Domain Approach

Proc. of IADIS International Conference WWW/Internet, 2009

Konferenz
2009

Kern Roman, Juffinger A., Granitzer Michael

Application of Axiomatic Approaches to Crosslanguage Retrieval

Working Notes for the CLEF 2009 Workshop, 2009

Konferenz
2009

Lex Elisabeth, Granitzer Michael, Juffinger A.

Know-Center at TREC 2009 Blog Distillation Task: A Notebook Paper

Notebook of TREC 2009, 2009

Konferenz
2008

Juffinger A., Kern Roman, Granitzer Michael

Exploiting Cooccurrence on Corpus and Document Level for Fair Crosslanguage Retrieval

Working Notes for the CLEF 2008 Workshop, 17-19 September, Aarhus, Denmark, 2008

Konferenz
2008

Weichselbraun A., Wohlgenannt G., Scharl A., Granitzer Michael, Neidhart T., Juffinger A.

Discovery and evaluation of non-taxonomic relations in domain ontologies

International Journal of Metadata, Semantics and Ontologies, 2008

Konferenz
The identification and labelling of non-hierarchical relations are among the most challenging tasks in ontology learning. This paper describes a bottom-up approach for automatically suggesting ontology link types. The presented method extracts verb vectors from semantic relations identified in the domain corpus, aggregates them by computing centroids for known relation types and stores the centroids in a central Knowledge Base (KB). Comparing verb vectors extracted from unknown relations with the stored centroids yields link-type suggestions. Domain experts evaluate these suggestions, refining the KB and constantly improving the components accuracy. Using four sample ontologies on ’energy sources’, this paper demonstrates how link-type suggestion aids the ontology design process. It also provides a statistical analysis on the accuracy and average ranking performance of Batch Learning (BL) vs. Online Learning (OL).
2008

Juffinger A., Kern Roman, Granitzer Michael

Crosslanguage Retrieval based on Wikipedia Statistics

Proc. of 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, 17-19 September, Aarhus, Denmark, 2008

Konferenz
2007

Weichselbraun A., Wohlgenannt G., Scharl A., Granitzer Michael, Neidhart T., Juffinger A.

Applying Vector Space Models to Ontology Link Type Suggestion

Proceedings of 4th IEEE International Conference on Innovations in Information Technology, Dubai, 2007, 2007

Konferenz
2007

Sabol Vedran, Gütl Christian, Neidhart T., Juffinger A., Klieber Hans-Werner, Granitzer Michael

Visualization Metaphors for Multi-modal Meeting

Workshop Multimedia Semantics - The Role of Metadata (WMSRM 07), Proceedings Band "Aachener Informatik Berichte", Aachen, 2007

Konferenz
The MISTRAL system, a service oriented architecture for semanticextraction of multimedia data from meeting recordings is described shortly. Itimproves on other similar systems by extracting a variety of semantic metadatafrom one media type and integrating it with concepts derived from other mediatypes, as well as by adding inference capabilities to resolve ambiguities and furtherenrich extracted data. On top of this state-of-the-art extraction functionality anumber of semantic-based, cross-modal visual applications for exploration andretrieval of extraction results were developed. Three selected applications,implemented upon the MISTRAL’s semantic application architecture, arepresented and described into detail in this paper.
2007

Juffinger A., Neidhart T., Granitzer Michael, Kern Roman, Weichselbraun A., Wohlgenannt G., Scharl A.

Distributed Web2.0 Crawling for Ontology Evolution

Proc.of 2nd IEEE International Conference on Digital Information Management, Lyon, 2007, 2007

Konferenz
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close