The ever growing need to efficiently store, retrieve and analyze massive datasets, originated by very different sources, is currently made more complex by the different requirements posed by users and applications. Such a new level of complexity cannot be handled properly by current data structures for Big Data problems.
To successfully meet these challenges, we propose a new generation of “Multicriteria Data Structures and Algorithms” that originate from some recent and preliminary results of the proponents. The “multicriteria” feature refers to the fact that we seamlessly integrate, via a “principled” optimization approach, modern compressed data structures with new, revolutionary, data structures “learned” from the input data by using proper machine-learning tools. The goal of the optimization is to select, among a family of properly designed data structures, the one that “best fits” the multiple constraints imposed by its context of use, thus eventually “dominating” the multitude of trade-offs currently offered by known solutions.
In this project, we will lay down the theoretical and algorithmic-engineering foundations of this novel research area, which has the potential of supporting innovative data-analysis tools and data-intensive applications.
Title: Multicriteria Data Structures and Algorithms: from compressed to learned indexes, and beyond.
Collaborations: University of Milan (PI: Dott. Marco Frasca), University of Palermo (PI: Prof. Raffaele Giancarlo), University of Piemonte Orientale (PI: Prof. Giovanni Manzini).
Grant: PRIN funded by MIUR Italy (call 2017).
Duration: 36 months.