TAGME technology
TAGME is a “topic annotator” that is able to identify meaningful sequences of words in a short text and link them to a pertinent Wikipedia page. This stunning contextualization has implications which go far beyond the enrichment of the text with explanatory links, because it concerns, in some way, with the understanding of the topics dealt within the text itself. These links add a new topical dimension to the text that let you to relate, classify or cluster short texts using a new representation that we called graph of topics. We believe that this new representation is much more powerful than the classic TF-IDF and, indeed, results very useful when dealing with short texts for which you cannot gather significant statistics.
A demo of TAGME is available here, and also a RESTful-like API is available. For algorithmic and experimental details please have a look at [Ferragina-Scaiella, ACM CIKM 2010], at [Ferragina-Scaiella, IEEE Software 29(1): 2012] and at the extended report available under arXiv. Part of the work on TAGME has been supported by a Google Research Award 2010, and an Italian MIUR-FIRB project on a “Web service and search engine to support a semantic and pluri-lingual access to Italian Culture on the Web”.
We experimented the usage of TAGME in several challenging scenarios.
- Labeled clustering of search results: TagMySearch is a meta-search-engine that gathers results from different commodity web-search engine, clusters the short textual fragments (aka snippets) returned by them and offers a sort of a “topic map” of the results of a user query. For details on the tool please have a look at [Scaiella et al., WSDM 2012]
- Classification of short news: TagMyNews is a simple tool that is able to classify with high accuracy a very short text in a fixed set of categories drawn from well-know online newspapers (nytimes.com, usatoday.com, news.google.com, etc…). TagMyNews yields very high accuracy even with a very small training set, and it is robust to “time-flow”, i.e. it yields the same accuracy even though the training set is composed by new stories published long time ago and it is specialized on short texts. For details on the tool please have a look at [Vitale et al., ECIR 2012]
- Computing the similarity between short texts: this tool computes a relatedness measure between two short texts deploying the innovative “graph of topics” representation returned by TAGME. Currently the tool is under development, so its performance is not yet tuned. Help us in improving it.
All these tools are currently under development and they could be frequently unavailable. Contact us for any questions.



Sorry, the comment form is closed at this time.