Jan 302013

As a contribute to the scientific community working on the field of entity annotation, we developed a framework to compare text annotators: systems that, given a text document, aim at finding the entities the text is about, identified as Wikipedia pages. The BAT-Fframework, written in Java, comes along with a formal framework that defines a set of problems, the way systems can be compared to each other, and a set of measures that – extending classic IR measures – fairly and fully compares entity annotators features. The formal framework, whose understanding is required to use the benchmark framework, is presented in this paper published at WWW’13.

Main features:

  • Compare in a fair and complete way any Entity-Annotation system.
  • Provides an implementation for all defined measures and match relations.
  • Easily extensible adding new systems, new datasets, new similarity measures.
  • Performs extensive testing on any Entity Annotator and any dataset.
  • Performs runtime testing.
  • Generates gnuplot charts and Latex tables summarizing test results.
  • Completely open source, distributed under the GPLv3 license.

You can download the BAT-Framework Environment 0.1, fork the project on GitHub, and read the Quick Reference.