ML100K - shark8me/lenskit GitHub Wiki
The LensKit integration tests and demos make use of the MovieLens 100K data set. This data set is freely available for non-commercial use.
The data set, and the full README and license, is available here: http://grouplens.org/node/73
The integration tests will automatically download the data set. To acknowledge the license terms and enable this, set the grouplens.mldata.acknowledge property to yes. You can do this permanently by putting the following in your ~/.m2/settings.xml
(on Windows 7, usually C:\Users\user\.m2\settings.xml
) file:
<settings xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/settings-1.0.0.xsd">
<profiles>
<profile>
<id>lenskit-data</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<grouplens.mldata.acknowledge>yes</grouplens.mldata.acknowledge>
</properties>
</profile>
</profiles>
</settings>
You can also set the lenskit.movielens.100k
Maven property to point to a copy of the data set elsewhere on your hard drive.
The JUnit tests in lenskit-integration-tests
are general integration tests that use the MovieLens 100K data set to test LensKit. When you run these tests via Maven (mvn verify
), the data set is automatically downloaded as described above. If you want to run these tests inside an IDE, you have two choices:
- Unpack the MovieLens data set into
data/ml-100k
in the directory from which you run the tests. - Set the
lenskit.movielens.100k
system property to point to the location of the MovieLens 100K data set.
If neither the property is set nor the data directory present, then the tests will be ignored so you can debug other unit tests.