Jul 172014

We are happy to announce that our system, SMAPH, co-developed by Marco Cornolti and Paolo Ferragina (University of Pisa), Massimiliano Ciaramita (Google), Hinrich Schütze and Stefan Rüd (University of Munich) achieved the best result in the ERD Challenge hosted by SIGIR 2014. Teams participating in the challenge (around 20) had to build a working system to do Entity Recognition and Disambiguation on search-engine queries, i.e. given a query, find the entities associated to it.

The problem of NER in queries is somehow harder than in long texts. Queries are often malformed, ambiguous and, most of all, lack context. A searcher that issue a query like glasses may be interested either in the drinkware or in eyeglasses, while a searcher that issue a query like google glasses has yet another need. armstrong moon landing should point to Neil Armstrong, while armstrong trumpet should point to Louis Armstrong.

SMAPH disambiguates queries. It piggybacks on a search engine to normalize the keywords of the query, then disambiguates them, and prune away bad entities. On the ERD challenge (short track) it scored the best result (68.5% F1). The system will shortly be available to be queried through a web service. Details on the system implementation are given in a paper.

We also participated as Acube Lab with WAT, a new version of TagMe, to the long-track competition (i.e. disambiguation of long texts), achieving a nice result (though unbalanced towards precision ;) ).

Further readings:  Research at Google blogLMUERD Challenge papers.