TAGME is a “topic annotator” that is able to identify meaningful sequences of words in a short text and link them to a pertinent Wikipedia page. This stunning contextualization has implications which go far beyond the enrichment of the text with explanatory links, because it concerns, in some way, with the understanding of the topics dealt within the text itself. These links add a new topical dimension to the text that let you to relate, classify or cluster short texts using a new representation that we called graph of topics. We believe that this new representation is much more powerful than the classic TF-IDF and, indeed, results very useful when dealing with short texts for which you cannot gather significant statistics.
A demo of TAGME and a RESTFul API are available here. For algorithmic and experimental details please have a look at [Ferragina-Scaiella, ACM CIKM 2010], at [Ferragina-Scaiella, IEEE Software 29(1): 2012] and at the extended report available under arXiv. Part of the work on TAGME has been supported by 2 Google Research Awards, one assigned in 2010 and another in 2013, and 2 Italian MIUR projects: a FIRB project on a “Web service and search engine to support a semantic and pluri-lingual access to Italian Culture on the Web” and a PRIN project on “ARS-Technomedia: Algorithms for Techno-Mediated Social Networks”. Moreover, TagMe technology won a fellowship at the Working Capital Award (2010), and its annotation technology is at the core of the SMAPH system, who was awarded the first position in the Short Track at the ERD Challenge 2014, within the 2014 ACM SIGIR Conference (see also the Google Post).
We experimented the usage of TAGME in several challenging scenarios.
- Labeled clustering of search results (2012): TagMySearch is a meta-search-engine that gathers results from different commodity web-search engine, clusters the short textual fragments (aka snippets) returned by them and offers a sort of a “topic map” of the results of a user query. For details on the tool please have a look at [Scaiella et al., WSDM 2012]
- Classification of short news (2012): TagMyNews is a simple tool that is able to classify with high accuracy a very short text in a fixed set of categories drawn from well-know online newspapers (nytimes.com, usatoday.com, news.google.com, etc…). TagMyNews yields very high accuracy even with a very small training set, and it is robust to “time-flow”, i.e. it yields the same accuracy even though the training set is composed by new stories published long time ago and it is specialized on short texts. For details on the tool please have a look at [Vitale et al., ECIR 2012]
- Annotation of search-engine queries (2014): the SMAPH system is an innovative tool which is able to efficiently and efficaciously annotate search-engine queries with entities drawn from Wikipedia that explain the “main concepts” occurring in that user query. This is a difficult task in that queries consist of few words and thus miss context to be used for disambiguating them; annotation can be seen as a “semantic” step towards query intent understanding. The SMAPH tool got the first rank in the Short-Track ERD Challenge 2014, this challenge was introduced this year at the prestigious ACM SIGIR conference and has seen the participation of more than 15 teams coming worldwide from academia and big companies (such as e.g. Google, Microsoft, Seznam). SMAPH was designed in collaboration with research groups at Google (Zurich) and University of Munich.
- Computing the similarity between short texts: this tool computes a relatedness measure between two short texts deploying the innovative “graph of topics” representation returned by TAGME. Currently the tool is under development, so its performance is not yet tuned. Help us in improving it.
As June 2014, TagMe API got abot 170 million queries from research groups and companies all around the world. Due to this success, in February 2014, we have made available TAGME’s source code under the terms of Apache License, Version 2.0 . If you wish to get a copy of the package, please contact us at tagme [at] di.unipi.it. Please keep in mind that, Whenever you will use it for your projects and publications, please cite the paper Paolo Ferragina, Ugo Scaiella: Fast and Accurate Annotation of Short Texts with Wikipedia Pages. IEEE Software 29(1): 70-75 (2012).
Finally, if you’ll make it part of your software, please mention the use of TAGME by showing the following logo: