About

Einige Posts, erstellt und/oder zusammengetragen von den Know-Center Mitarbeitern des Wissenserschließungsbereichs

Easy Blogging with Firefox & Scribefire

June 2nd, 2008 by kcuser

I just stumbled over ScribeFire,
a Firefox Plugin for blogging. Hitting F8 you are getting a split
screen and can type or drag content into it as you want. Supports
multiple accounts, categories and tags. Very nice for fast blogging ;)

Posted in Tools, plugin, firefox, blogging | No Comments »

Paper Review: The structure and function of complex networks

September 18th, 2007 by grani

I am currently reading the paper “The structure and function of complex networks” from M. E. J. Newman for our three weekly AWSR reading group. The paper is a survey on complex networks and their underlying theory. This post is about to write down the most important statements of this paper. Since it is more or less a summary for myself, this may be of lesser use for others (so i apologize before ;) .

  • Focus of the survey: (i) survey statistical properties for networked systems, (ii) create network models to understand the statistical properties and (iii) predict the behavior depending on the type of model
  • The degree of vertex in random graphs is distributed according to a binomial distribution or poisson distributed for large n’s.
  • Small World Effect (defined by Milgram’s famous experiment)
    • Most vertices are connected by a short path in the network (e.g. bacon number, erdös number). Thus, those networks have a low mean geodesic distance L (i.e. average shortest path between any two vertices)
    • networks show the small-world effect if the value of L scales logarithmically or slower with network size for fixed mean degrees
    • Networks with power law degree distributions have values of L that increase no faster than log n/loglog n
  • Transitivity or Clustering
    • == if (A,B) and (B,C) than there is a heightened probability that (A,C) (with (X,Y) indicating an edge between vertex X and vertex Y), or in other words: “The friend of your friend is most likely your friend too”.
    • measured by the clustering coefficient
    • values for the clustering coefficient is generally considerable higher in real networks than in random graphs (C=O(1) for n=> infinity in contrast to random graphs, where C= O(n^-1))
    • Reciprocity is the indicator in directed graphs, that two vertices point to each other
  • Degree Distributions
    • p_k is Defined as the probability that a randomly drawn vertex has an degree (resp. in- /outdegree) of k
    • Degree Distribution = Histogram over all k in a graph. most often problematic to calculate since of noisy/missing data. Therefore one usually uses the cumulative probability distribution (the sum of all vertices having a degree larger than or equal to k)
    • Depending on the graph (e.g. bipartit, directed) different distributions (and their combinations) have to be examined
    • Scale-free networks are networks having a power-law degree distribution
  • Network resilience
    • Changes in degree distribution (and therewith in function of an network) upon vertex removal
    • Taking part of the WWW as a network and removing vertices at random does not change the distance significantly. Thus, the network is highly resilience. This is not the case if vertices with high degrees are removed.
  • Mixing Patterns
    • Assortative mixing wich is some kind of CoOccurrence analysis of vertices with differen types (e.g. herbivores , carnivores and plants in a food network, or races cooccurring in social contacts etc.)
  • Degree Correlation: assortative mixing of vertices with similar or different degree.
  • Community Structure
    • groups of vertices which have a high density of edges within them and a lower density of edges to other groups
    • usually done by cluster analysis (e.g. HAC)

These properties are discussed in the first part of the survey, and i will end my blog entry here. So far the survey is worth reading, since it gives a comprehensive. I will continue writing on this survey (after i read it completely ;)

Posted in Uncategorized, paper review, awsr, network theory, survey, graph theorie | No Comments »

PIG - Relational Algebra für Hadoop

September 18th, 2007 by grani

I recently stumbled over PIG ( not a pig ;) , a yahoo! research project in his 0.1*th which provides relational algebra over Hadoops distributed computing technology. With PIG SQL like queries and data analysis using a large number of commodity machines should become feasible.

According to rumors it is planned that PIG will enter the apache incubator making it a valid approach for very large relational data processing. I look forward to it ;)

Posted in Uncategorized, hadoop, distributed processing, relational algebra, research project, yahoo! | No Comments »

Paper Review SWISH: semantic analysis of window titles and switching history

September 11th, 2007 by grani

After not posting for a while (largely induced by having lots of other things to do) i stumbled over an interesting paper from some guys at Microsoft who are using machine learning techniques for automatically detecting user tasks.

As an overview, they are recording interaction of users on Windows PC’s and store Window title and time stamp. By applying clustering (i.e. Probabilistic LSI calculated via an EM Algorithm) on window titles and time stamp user tasks are automatically extracted from the log file.

What I find interesting on this paper is the approach to detect the user context via an unsupervised ML approach. The paper gives a very good motivation on doing this and introduces the basics of PLSI and the EM algorithm very comprehensively.

However, evaluation could be more detailed, especially regarding the user session. They claim to reach a recall on user task of around 70% (with an precision around 50%) by comparing the clusters obtained from a 4 hour user session to a set of manually labeled tasks of the same session. I think this number should be handled with care, since 4 hours is not the world largest data set and the single user  may not be the best representative of a sample ;) . Nevertheless, it shows how far one can get in detecting the context/task of a user with very simple means.

To summarize:

Why to read: Nice idea and very well motivated including links to background information, comprehensive introduction to PLSI, EM Algorithm

Take with care: Evaluation and capability to generalize the given numbers to arbitrary tasks
So, thats it.
cheers

Posted in Uncategorized, HCI, clustering, task recognition, paper review | No Comments »

does software design really payoff?

August 8th, 2007 by kcuser

Martin Fowler’s opinion on putting effort into software design

Posted in Uncategorized | No Comments »

Creating a digital earth from digital, collectively gathered photos

July 13th, 2007 by kcuser

One colleagues has send me a link to a very cool talk held by Blaise Aguera y Arcas from the Microsoft Live Labs at the TED conference. He showed a tool called sea dragon, for browsing very large collection of digital data, mostly images. While it looks cool, well it is nothing new. Than he came up with photosynth and thats, to put it short, pretty amazing. From a set of digital images, a 3D model is reconstructed which can then be used for browsing the photos.

Microsoft released a live demo of photosynth. It is worth to take a look.

Posted in Uncategorized, Microsoft, Browsing, images, photo, cool, photosynth | No Comments »

Linux goes Semantic Desktop

July 11th, 2007 by kcuser

In an article on linuxlookup.com the integration of semantic capabilities into KDE 4 have been announced. NEPOMUK , a EU Research Project, provide its result via the NEPOMUK-KDE library.

It is good to see that things are going forward (especially on the linux side), but the video shown at linuxlookup.com does not really convince me. It shows tagging and commenting facilities, which are nice but not necessarily semantic. I think i have to take a look under the hood for getting the details.

Posted in Uncategorized, semantic desktop, linux, news | No Comments »

Social Semantic Web

July 10th, 2007 by kcuser

Tom Gruber, well known for his definition on ontologies, had held a famous key note at the 5th semantic web conference 2006 on combining social web technology with semantic web technology. He outlines the need for this and gives example on tagging and his current project, www.realtravel.com.
The talk is very informative and enjoyable, so if you have 60 minutes of time, look at it here

Posted in Uncategorized, Semantic Web, social web, gruber, key note, semantic web conference, video lecture | No Comments »

Video Annotation Tools

June 29th, 2007 by kcuser

I just stumbeld over a (small) list of video annotation tools and thought our blog is a good point to archive them ;)

VideoAnnEx - IBM MPEG-7 Annotation Tool
http://www.alphaworks.ibm.com/tech/videoannex

iFinder by Fraunhofer
http://www.imk.fhg.de/de/ifinder

Muvino - An MPEG-7 Video Annotation Tool
http://vitooki.sourceforge.net/components/muvino/code/

and of course our partner JOANNEUM Research
http://mpeg-7.joanneum.at

Posted in Tools, video annotation, mpeg | No Comments »

Why we tag?

June 14th, 2007 by kcuser

I really like this yahoo! research berkley blog. It has very nice and informative postings. One of this postings is on a recent CHI paper of this group on tagging.

Posted in Uncategorized, Web 2.0, tagging, paper, chi 2007 | No Comments »

ShiftHappens

June 14th, 2007 by kcuser

Nice slideshow on some facts like that in 2049 you can buy a computer for $1000 which is as intelligent as the whole human species ;)

Posted in Uncategorized, information explosion, information growth, trends, innovation, future trends | No Comments »

Using Java to Crack Office 2007

June 6th, 2007 by kcuser

Historically, opening Office files from within Java has always been something of a problem because Office documents (principally Word, Excel, and PowerPoint) were stored in a binary format known to COM developers worldwide as the structured storage format. Office 2003, introduced new XML formats unique to itself (such as WordML), which Java developers could use to read or write Office documents, but the formats were not well-documented, and Java developers frequently found themselves learning the WordML format through trial-and-error development. Various open-source projects stepped in to try and mitigate the situation, such as the POI framework from Apache, for reading and writing Excel documents, or various Java-COM solutions With Office 2007, Microsoft has made a significant part of these problems “go away”. Without anything more complicated than the native JDK itself-in other words Java apps can now read and write any Office 2007 document, because Office 2007 documents are now nothing more than ZIP files of XML documents.

Link to Article

Posted in Uncategorized, retrieval, Java | No Comments »

Tuning search at Google

June 4th, 2007 by kcuser

Saul Hassal from the NY times wrote an article on Google’s search quality tuning, based on an interview with Singhal and by spending a day with Udi Manber, Matt Cutts and the search quality team from Google. Good stuff to read for all those retrievers out there ;) .

Posted in Uncategorized, Google, search quality, ranking algorithm | No Comments »

Virtual Headphones

June 2nd, 2007 by kcuser

Hot Stuff! Microsoft has developed an audio system which allows to project sound in a locally bounded region. So one user can hear for example music from the pc while the other user, who stands 2 meters away, does not hear anything. Cool stuff for large offices ;) Details are available here.

Posted in Uncategorized, Microsoft, innovation, hardware, new technology, research | No Comments »

Google Gears - using Google Apps offline

June 1st, 2007 by kcuser

Google released a browser extension including a JavaScript API named Google Gears, which makes it possible to store browser content in a local database. This allows users for example to use Gmail, Google Reader, spreadsheets etc. Offline - so the run for the best offline office suite has begun ;)

Posted in Uncategorized, Google, office, JavaScript API, Tools | No Comments »

« Previous Entries