Run Apache Pig in local mode - dbpedia-spotlight/dbpedia-spotlight GitHub Wiki

This is a guide, how to run Apache Pig in local mode. With this setup, I have created data models for the German language.

Hardware setup

I use a machine with 32 GB RAM and one processor with several cores.

Operating system is openSUSE 12.3.

Java VM is OpenJDK 1.7.0_51.

Change index_db.sh

I have added following line to index_db.sh before the pig scripts are called

export PIG_HEAPSIZE="8096"

Change the pig script names_and_entities.pig

I have added following lines to names_and_entities.pig

SET hadoop.tmp.dir '/data/tmp/hadoop-reinhard'
%default JAVA_XMX 4g
SET mapreduce.map.java.opts -Xmx$JAVA_XMX
SET mapreduce.reduce.java.opts -Xmx$JAVA_XMX
SET mapred.child.java.opts '-Xmx8096m';

I have changed hadoop.tmp.dir because the /tmp folder is contained in the root partition and the size is not big enough.

Change the pig script token_counts.pig

I have added following line to token_counts.pig

SET hadoop.tmp.dir '/data/tmp/hadoop-reinhard'

The reason is given above.

Run the pig scripts

Now you are able to run the apache pig scripts in local mode by calling index_db.sh and by setting the -l option for local mode.