Run Apache Pig in local mode - dbpedia-spotlight/dbpedia-spotlight GitHub Wiki
This is a guide, how to run Apache Pig in local mode. With this setup, I have created data models for the German language.
I use a machine with 32 GB RAM and one processor with several cores.
Operating system is openSUSE 12.3.
Java VM is OpenJDK 1.7.0_51.
I have added following line to index_db.sh before the pig scripts are called
export PIG_HEAPSIZE="8096"
I have added following lines to names_and_entities.pig
SET hadoop.tmp.dir '/data/tmp/hadoop-reinhard'
%default JAVA_XMX 4g
SET mapreduce.map.java.opts -Xmx$JAVA_XMX
SET mapreduce.reduce.java.opts -Xmx$JAVA_XMX
SET mapred.child.java.opts '-Xmx8096m';
I have changed hadoop.tmp.dir because the /tmp folder is contained in the root partition and the size is not big enough.
I have added following line to token_counts.pig
SET hadoop.tmp.dir '/data/tmp/hadoop-reinhard'
The reason is given above.
Now you are able to run the apache pig scripts in local mode by calling index_db.sh and by setting the -l option for local mode.