Figurit Homepage

Natural Language Understanding

Scientific coordinator: Paolo Ferragina

The classical bag-of-words paradimg relies its representation upon a merely set of words that compose an input text. Since this paradigm totally ignores the meaning of the words as well as the synonymy, polisemy and similarity between them, the performance of the resulting downstream applications can be drammatically penalized.

In order to overcome these limitations, A³ Lab developed a suite of algorithmic and artificial intelligence software tools for the efficient and efficacious semantic annotation (also referred in literature as entity linking) of natural language text with Wikipedia entities (pages). A number of results have shown that this annotation is extremely powerful: not only does it provide a deeper contextualization of the input text, but it also enable machines to effectively understand the natural language text as a small piece of the whole human knowledge.

Figure 1. Example of the automatic annotation of an input textwith two Wikipedia entities (i.e., Leonardo da Vinci and Mona Lisa). Transparent elements (e.g., Italy, Science, …) are the nodes of the underlying Knowledge Graph that can be use to infer information starting from the annotated entities.

Another key of success of A³ Lab is the introduction of a new representation that enhances the bag-of-words representation with a new graph of entities (concepts) derived from the semantic annotation of the input text. Thanks to this representation, machines can now exploit the interconnnections present in the underlying Knowledge Graph in order to infer and enrich the input text with information that is not explicitely stated in its content.

The A³ Lab software suite is publicly available here and, since its official launch in 2015, it has already satisfied more than 3 millions of textual queries.

Software & Datasets

Photo of SWAT

SWAT

Salient entity linking of news and Web pages

RESTful API
Photo of SMAPH

SMAPH

Entity linking of Web queries with Wikipedia pages

GitHub · RESTful API
Photo of Two-Stage Framework

Two-Stage Framework

Two-Stage Framework for Wikipedia relatedness computation

GitHub
Photo of WiRe & WikiSim Datasets

WiRe & WikiSim Datasets

Datasets for entity relatedness benchmarking

Download
Photo of WAT

WAT

Efficient entity linking of news and Web pages

RESTful API
Photo of TagMe

TagMe

Entity linking of short and long text with Wikipedia pages

GitHub · RESTful API
Photo of WISER

WISER

Expert profiling and finding

Website
Photo of Twittomatic

Twittomatic

Distributed twitter crawler in Python

GitHub

Selected Publications