GettingStarted - shark8me/lenskit GitHub Wiki
Most people who want to work with LensKit will want to use the LensKit libraries to create their own projects, with their own recommenders. The LensKit libraries make it easier to develop your recommenders and to compare them with world-class implementations of the best-known previously created recommenders, which are already built into LensKit. This document is focused on getting started building your own LensKit project.
Since LensKit is open source, you can always download the source code and work from there. However, most users of LensKit will be able to start from the version of LensKit installed in the Maven repository. With that version you can create your own recommenders and test them against the recommenders built in to LensKit. You can read about starting with the source code elsewhere in this manual. Here we'll discuss how to use LensKit without having to download the sources.
Most people will initially use LensKit to compare the performance of several different recommender algorithms, to choose the algorithm that will be best for their purposes. After choosing an algorithm some users will want to deploy it on a web site. You can read about how to do that in Web Integration. This chapter focuses on setting up and running offline evaluation experiments.
Many setup problems with LensKit arise from not having a Java Development Kit (JDK) installed, or from not having the JAVA_HOME environment variable set correctly. Please remember that a JDK is different from a Java Runtime Engine (JRE); at this time, you need a JDK to use LensKit.
The best way to get started with LensKit is to use Apache Maven (or compatible tool) to automatically locate and download all of the Java jars your project will need.
You can create a new Maven project from the command line or with most Java IDEs (including Eclipse, NetBeans, and IntelliJ IDEA). To help you get started quickly, we provide two Maven archetypes (project templates) to set up a new project that uses LensKit to do a comparative offline evaluation of recommender algorithms.
-
lenskit-archetype-simple-analysis
sets up a minimal evaluation project that puts everything in the same directory. -
lenskit-archetype-fancy-analysis
creates a more sophisticated project that uses the Mavensrc
andtarget
directory structures appropriately, and can scale to multiple evaluation and analysis scripts.
To create the project from the command-line use:
mvn archetype:generate \
-DarchetypeGroupId=org.grouplens.lenskit \
-DarchetypeArtifactId=lenskit-archetype-simple-analysis \
-DarchetypeVersion=2.0 \
-DinteractiveMode=no \
-DgroupId=org.your.group \
-DartifactId=your-project-name \
-Dversion=0.1
The first three definitions are the specification of the archetype you
wish to use to create the new project. They should all be exactly as
typed above, except possibly for archetypeArtificatId
, which should
be the simple or fancy archetype, as you prefer, and
archetypeVersion
, which should be the current version of the
archetype. (Hint: this is the same as your current LensKit version.)
The last three definitions are the specification of the new project you are creating. In general they can be anything you want.
For cut-and-paste convenience, here's an unformatted version of the same command-line:
mvn archetype:generate -DarchetypeGroupId=org.grouplens.lenskit -DarchetypeArtifactId=lenskit-archetype-simple-analysis -DarchetypeVersion=2.0 -DinteractiveMode=no -DgroupId=org.your.group -DartifactId=your-project-name -Dversion=0.1
Maven will run for a while, after which your-directory-name
will
exist, populated with your new project!
Select ‘File → New → Project …’. In the New Project dialog make sure ‘Create a simple project’ is unchecked, so you can choose an archetype. On the next dialog click the box ‘Include snapshot archetypes’ if you want the latest lenskit archetypes, and filter for ‘lenskit’. Choose the lenskit archetype you want to use to create your project.
Your newly-created project uses the LensKit Maven plugin to manage and
run the evaluation. This plugin provides a Maven lifecycle for
evaluating recommenders, consisting of the following phases:(WARNING: Same to the lifecycle of Maven, running later phase triggers all previous phases. So the evaluation output will be overwritten by lenskit-analyze
)
-
lenskit-pre-eval
: Any setup that needs to be run before the evaluation goes here. In the archetype-generated projects, this phases retrieves the MovieLens 100K data set (using the Ant script inget-file.xml
). This phase also triggers a Mavencompile
, so any Java code in your project is compiled and ready to use by this phase. -
lenskit-eval
: Run the evaluation itself, producing output (predicted values, RMSE over test runs, etc.) in CSV files. This consists of using the LensKit evaluator to run the evalution defined in theeval.groovy
script. -
lenskit-post-eval
: This is for any postprocessing you need to do to the evaluation output prior to analyzing and plotting its results. By default, nothing happens here. -
lenskit-analysis
: This phase is for the final analysis of your evaluation run. The archetype uses this to run R scripts against the results of the evaluation, producing statistical analyses or graphics files. Note that part of the analysis requires R statistics package to be installed, along with theggplot2
module from CRAN. LensKit works fine without R: either just run through thelenskit-eval
stage, or remove the section of thepom.xml
file generated by the archetype that runsRscript
. You can then process the.csv
files using Excel, SAS, SPSS, or your favorite analysis tool. You will need to make sure R is added to your path correctly. Most Unix users know how do do this; look here for Windows instructions. You should restart Eclipse after adding R to your path.
You can run a crossfolding train-test evaluation of the configured recommenders on the MovieLens 100K data set by changing into the directory you created it in and typing:
mvn lenskit-analyze -Dgrouplens.mldata.acknowledge=yes
This runs the mvn process through the analysis
stage of the
lifecycle, which comes after the three evaluation stages of the
project. The -Dgrouplens.mldata.acknwoedge=yes
indicates that you
have acknowledged the terms accompanying the
MovieLens 100K data set.
If you prefer, you can run the stages independently:
mvn lenskit-pre-eval -Dgrouplens.mldata.acknowledge=yes
will fetch the dataset,
mvn lenskit-eval -Dgrouplens.mldata.acknowledge=yes
will fetch the dataset and run the eval script,
mvn lenskit-post-eval -Dgrouplens.mldata.acknowledge=yes
will fetch the dataset, run the eval script, and do any post eval operations you wish.
You can edit your Maven settings file, ~/.m2/settings.xml
, to
include the following to avoid needing to type this every time:
::xml
<settings xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/settings-1.0.0.xsd">
<profiles>
<profile>
<id>lenskit-data</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<grouplens.mldata.acknowledge>yes</grouplens.mldata.acknowledge>
</properties>
</profile>
</profiles>
</settings>
Select the project you created using the archetype in Package Explorer
view. Right click on it and select “Run Configurations …” Create a
new Maven run configuration, set the directory to the directory of the
project, and the goal to lenskit-analyze
. Then Run your new
configuration. You can view the PDFs from the target
directory,
though you will have to refresh the directory to see the new files.
This archetype is to build more sophisticated projects for doing analysis of recommender algorithms. The key locations are:
-
src/eval
: The key scripts that will be run as part of the evaluation. -
target/data
: the location of the MovieLens data files, and the crossfolds. -
target/analysis
: the location of the data files output by the evaluation, and the.pdf
charts generated by the R script.
This archetype is for projects that do all three phases, and that organize the input and output according to maven best practices. All of the scripts for the eval are in the src/eval directory. Each script takes in some input data, and produces some output data. The archetype is intended to be used with the following structure:
-
lenskit-pre-eval
: gets data into the target/data directory. The ant script in src/eval/get-data.xml is used to fetch the data. -
lenskit-eval
: takes data from the target/data directory and creates a set of crossfold datasets also in target/data. Runs an evaluation script in src/eval/eval.groovy, which operates on the crossfold data in target/data, and produces output in target/analysis. -
lenskit-analysis
: runs an R script, producing more output in target/analysis.
The key user files that you are likely to want to edit are:
-
pom.xml
: to change the value of grouplens.mldata.acknowledgement, or to change the dataset that is downloaded. -
src/eval/get-data.xml
: to change the dataset that is downloaded.
May require changes in pom.xml as well. -
src/eval/eval.groovy
: to change the lenskit evaluation that is run, perhaps by configuring different recommenders. -
src/eval/chart.R
: to change the analysis of the output data in target/analysis,
perhaps including the charts that are generated.
This structure fits the Maven model: all input files are in the src tree, and all generated files are in the target tree, where they may be cleaned by the clean target.
You can run the fancy archetype the same way as the simple archetype (above) -- though the result files will be put in slightly different places.
The fancy archetype comes with one other interesting feature: it shows you how to generate configuration graphs for your algorithms. This graph shows how the various components were plugged together by the dependency injector (grapht) to create a working recommender. The configuration graphs are produced as .dot files in the target/analysis directory. If you install the open source graphviz software you can generate viewable forms of these graphs. For instance:
dot -Tpdf SlopeOne.dot > SlopeOne.pdf
will create a pdf for the SlopeOne configuration. graphviz is very powerful, and has many other ways you can view the graphs.
The recommender implementations in LensKit are highly modular to allow their behavior to be customized and to allow individual pieces to be individually replaced. The LensKit recommender engine factory allows the various components of the desired recommender to be configured. The good news is that despite the flexibility, implementing a simple recommender is simple. The core configuration point is to specify implementations of the necessary RatingPredictor and/or ItemRecommender interfaces. Other parameters and components have reasonable defaults that can be reconfigured via the factory if needed.
After configuring a builder, you also need to configure your data source. The recommendation framework depends on having access to data sources via the RatingDataAccessObject interface.
EventDAO dataSource = new SimpleFileRatingDAO(new File("ratings.csv"), ",");
LenskitConfiguration config = new LenskitConfiguration();
config.bind(EventDAO.class)
.to(dataSource);
config.bind(ItemScorer.class).to(ItemItemScorer.class);
/* configure a normalizer and baseline predictor */
config.bind(UserVectorNormalizer.class)
.to(BaselineSubtractingUserVectorNormalizer.class);
config.bind(BaselineScorer.class, ItemScorer.class)
.to(ItemMeanRatingItemScorer.class);
/* get the and use the recommender */
Recommender rec = LenskitRecommender.build(config);
ItemRecommender irec = rec.getItemRecommender();
/* get recommendations from irec, or use e.g. getRatingPredictor() */
Once the recommender engine is created, it can be shared across threads, each of which is free to open its own recommender and interact with the system.
Configuring LensKit has further documentation on how to configure recommenders.
The API documentation provides more information about how LensKit works and how to use it.
If you prefer to create your maven projects a different way, the necessary dependencies to get started are:
<dependency>
<groupId>org.grouplens.lenskit</groupId>
<artifactId>lenskit-core</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<!-- to get the k-NN recommenders -->
<groupId>org.grouplens.lenskit</groupId>
<artifactId>lenskit-knn</artifactId>
<version>2.0</version>
</dependency>