Dec 202016

In an effort supported by the SoBigData European Research Infrastructure, we have published as a web service SMAPH, our Entity Linking system for web queries and very short text (see the paper). Happy entity linking!

May 092016

The dataset contains four collections of files: three collections of genomes, each belonging to a distinct species, and a set of three 32-bit integer arrays. In particular:

  • Cere: collection of 39 strains of Saccharomyces cerevisiae (cere);
  • E. Coli: collection of 33 strains of the  bacteria Escherichia coli;
  • Para: collection of 36 strains of the  yeast Saccharomyces paradoxus;
  • DLCP: Differential Longest Common Prefix arrays computed by the Relative-FM data structure from a set of three human genomes.

These files are formatted as follows:

  • Cere, E. Coli, Para: textual files (ASCII), sequence of characters drawn from the alphabet ACTGN.
  • DLCP: binary files, sequence of signed 32-bits integers in little-endian byte-order (as obtained by dumping an array of int32_t into a file with a single fwrite in any modern machine).

The dataset (gzipped tar file, ~7.5GB) can be downloaded here.

Apr 282016

We are glad to announce that our paper “A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries” have been accepted at the 25th International World Wide Web Conference (WWW 2016) conference. It is now available for download.

May 232015

We are glad to announce that our paper “On Analyzing Hashtags in Twitter” has been accepted at the 9th International AAAI Conference on Web and Social Media (ICWSM 2015).

In this paper we build a novel graph upon hashtags and (Wikipedia) entities (HE-Graph) and we exploit it to address two challenging problems regarding the “meaning of hashtags”: hashtags relatedness and hashtag classification.

We also constructed two datasets for hashtags relatedness and classification. We are happy to release them to the research community, together with the HE-graph we constructed (Hashtag Datasets).

Jan 262015

We announce that our papers “Compressed indexes for string-searching in labeled graphs” and “GERBIL – General Entity Annotator Benchmark” have been accepted at the 24th International World Wide Web Conference (WWW 2015) conference.

See you in Florence ;)

Jul 182014

We are glad to announce that Giuseppe Ottaviano and Rossano Venturini’s “Partitioned Elias-Fano Indexes” won the SIGIR 2014 Best Paper Award. Here’s a link to the article.