Tag Recommender Framework - domkowald/tagrecommender GitHub Wiki

Description

The aim of this work is to provide the community with a simple to use, generic tag-recommender framework to evaluate novel tag-recommender algorithms with a set of well-known std. IR metrics such as MAP, MRR, P@k, R@k, F1@k and folksonomy datasets such as BibSonomy, CiteULike, LastFM or Delicious and to benchmark the developed approaches against state-of-the-art tag-recommender algorithms such as MP, MP_r, MP_u, MP_u,r, CF, APR, FR, GIRP, GIRPTM, etc.

The software already contains two novel tag-recommender approaches based on cognitive science models. The first one (3Layers) uses category information and is based on the ALCOVE theory. The second one (BLL+C) uses time information is based on the ACT-R theory.

This software is free of charge for academic purposes. Please contact the author if you want to use this software for commercial purposes. You are not allowed to redistribute this software or its source code.

Please cite the papers if you use this software in one of your publications.

Download

The source-code can be directly checked-out through this repository. It contains an Eclipse project to edit and build it and an already deployed .jar file for direct execution. Furthermore, the folder structure that is provided in the repository is needed, where csv is the input directory and metrics is the output directory in the data folder. Both of these directories contain subdirectories for the different datasets:

  • bib_core for BibSonomy
  • cul_core for CiteULike
  • flickr_core for Flickr
  • wiki_core for Wikipedia (based on bookmarks from Delicious)

How-to-use

The tagrecommender .jar uses three parameters: First the algorithm:

  • bll_c for BLL and BLL+C (based on ACT-R theory)
  • 3layers for 3Layers (based on ALCOVE theory)
  • lda for Latent Dirichlet Allocation
  • cf for Collaborative Filtering
  • fr for Adapted PageRank and FolkRank
  • girptm for GIRP and GIRPTM
  • mp for MostPopular tags
  • mp_u_r for MostPopular tags by user and/or resource

, second the dataset(-directory):

  • bib for BibSonomy
  • cul for CiteULike
  • flickr for Flickr

and third the filename (without file extension)

Example: java -jar tagrecommender.jar bll_c bib bib_sample

Input format

The input-files have to be placed in the corresponding subdirectory and are in csv-format (file extension: .txt) with 5 columns (quotation marks are mandatory):

  • User
  • Resource
  • Timestamp in seconds
  • List of tags
  • List of categories (optional)

Example: "0";"13830";"986470059";"deri,web2.0,tutorial,www,conference";""

There are three files needed:

  • one file for training (with _train suffix)
  • one file for testing (with _test suffix)
  • one file that first contains the training-set and then the test-set (no suffix - is used for generating indices for the calculations)

Example: bib_sample_train.txt, bib_sample_test.txt, bib_sample.txt (combination of train and test file)

Output format

The output-file is generated in the corresponding subdirectory and is in csv-format with 5 columns:

  • Recall
  • Precision
  • F1-score
  • Mean Reciprocal Rank
  • Mean Average Precision

for k = 1 to 10 (each line is one k)

Example: 0,5212146123336273;0,16408544726301685;0,22663857529082376;0,26345775109372344;0,3242776089324113

References