This is a freely downloadable collection of datasets that can be used to conduct several hashtag-related experiments, including hashtag relatedness and hashtag classification, as we did in our paper “On Analyzing Hashtags in Twitter”.
We offer three distinct datasets:
- HE Graph Download (about 100MB)
- Hashtag relatedness dataset Download
- Hashtag classification dataset Download
The file contains the complete Hashtag-Entity Graph in TSV format. There are 5 columns: hashtag ID, entity ID, annotation scores, hashtag, entity title. The annotation score contains a
: separated list of confidence scores returned by TagME.
3-column TSV file: the first column contains the group identifier (details in the paper), while the other two columns contain the two hashtags forming a hashtag pair.
2-column TSV file: the first column contains the hashtag, while the second contains the category to which the hashtag belongs to.
These datasets are available under the Creative Commons Attribution-ShareAlike License. We hope they will be useful!