How to Setup, Build, and Use Giraffa - GiraffaFS/giraffa GitHub Wiki
wget https://services.gradle.org/distributions/gradle-2.5-all.zip
unzip gradle-2.5-all.zip
sudo mv gradle-2.5 /usr/local
sudo ln -s /usr/local/gradle-2.5/ /usr/local/gradle
Configure ∼/.bashrc, make sure that you have the following section in this file:
GRADLE_HOME=/usr/local/gradle
export PATH=$JAVA_HOME/bin:$GRADLE_HOME/bin:$PATH
Check that Gradle is correctly setup
gradle -version
> ------------------------------------------------------------
> Gradle 2.5
> ------------------------------------------------------------
> Build time: 2015-07-08 07:38:37 UTC
> Build number: none
> Revision: 093765bccd3ee722ed5310583e5ed140688a8c2b
> Groovy: 2.3.10
> Ant: Apache Ant(TM) version 1.9.3 compiled on December 23 2013
> JVM: 1.7.0_09 (Oracle Corporation 23.5-b02)
> OS: Mac OS X 10.8.5 x86_64
Get the file from:
https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
Refer installation guide from [Protobuf Git] (https://github.com/google/protobuf)
Please note the default install location may not working to Mac users. In that case, users need follow
README's instruction to change install location.
Check that Protobuf compiler is correctly setup
protoc --version
libprotoc 2.5.0
Using Git, clone our repository:
git clone https://github.com/GiraffaFS/giraffa.git
Check out trunk:
git checkout trunk
Giraffa is built using Gradle as a build tool. Main build.gradle file is located in the giraffa directory. Here's a list of different options:
- Build Giraffa distro and run all the tests:
./gradlew clean build tar
Note: by default all test output is redirected to files under build/test-reports. If you want tests to output to console, then editbuild.gradle
file and setredirectTestOutputToFile=false
, or set it during your gradle command execution.
Output: The installation will generate the foldergiraffa/giraffa-standalone/build/distributions
. Inside there will be a tar.gz distribution of Giraffa as a standalone demo. Untar this file anywhere in your system and follow the Giraffa Setup instructions below to configure and run the client.
- Build Giraffa without tests
./gradlew clean build -x test
- Build Giraffa jars
./gradlew clean assemble -x test
- Generate Eclipse projects
./gradlew eclipse -x test
- Build Giraffa with code coverage
./gradlew clean build jacocoTestReport
- Build Giraffa for standalone demo
./gradlew clean assemble tar
- Run Giraffa WebUI demo (port 40010)
./gradlew -PmainClass=org.apache.giraffa.web.GiraffaWebDemoRunner execute
This section describes single-node Giraffa cluster setup.
Requirements. See also Requirements section.
- Java 7 or Java 8
- Hadoop 2.5.1 (Download here)
- HBase 1.0.1 (Download here)
- Giraffa 0.3.0 (Download here)
Giraffa runs on top of HBase and HDFS. Thus, you should first set up your cluster to run HBase with HDFS as you normally would. See below for instructions on doing this:
- Unpack Hadoop, HBase, and Giraffa into
hadoop/
,hbase/
, andgiraffa/
directories, respectively.
These location will be referred to asHADOOP_HOME
,HBASE_HOME
, andGIRAFFA_HOME
.- Set variable
export JAVA_HOME = /usr/java/default/
- Set variable
- Configure Hadoop:
- In
$HADOOP_HOME/etc/hadoop/core-site.xml
set
fs.defaultFS = hdfs://localhost:8020
- In
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
set
dfs.namenode.name.dir = file:///.../data/hdfs/name
dfs.datanode.data.dir = file:///.../data/hdfs/data
- In
- Configure HBase:
- In
hbase/conf/hbase-site.xml
set
hbase.rootdir = hdfs://localhost:8020/hbase
hbase.tmp.dir = data/hbase
hbase.cluster.distributed = false
hbase.coprocessor.master.classes = org.apache.giraffa.web.GiraffaWebObserver
hbase.zookeeper.quorum = localhost
- Copy
$GIRAFFA_HOME/conf/giraffa-default.xml
into$HBASE_HOME/conf/
- Copy
$GIRAFFA_HOME/conf/core-site.xml
under a new name$HBASE_HOME/conf/giraffa-site.xml
giraffa-site.xml
is used by Giraffa WebUI, which is hosted by HBase
- In
- Configure Giraffa:
- In
$GIRAFFA_HOME/conf/conf/giraffa-env.sh
set variables
export HADOOP_HOME=".../hadoop/"
export HBASE_HOME=".../hbase/"
export GIRAFFA_ROOT_LOGGER=INFO,RFA
- In
- Format HDFS NameNode:
bin/giraffa namenode -format
- Start Giraffa in pseudo-distributed mode:
bin/start-giraffa.sh
. You should see the following processes running: NameNode, DataNode, HMaster. - Format Giraffa:
bin/giraffa format
. - Run Giraffa CLI commands:
bin/giraffa fs -ls /
Giraffa CLI commands run identical to the Hadoop CLI commandshadoop fs -[op]
, and are used to access data in Giraffa file system. - Check Giraffa WebUI at
http://localhost:40010
yarn-giraffa-daemon.sh
launches yarn-daemon.sh
and yarn-giraffa
launches yarn
.- Launch the ResourceManager:
bin/yarn-giraffa-daemon.sh start resourcemanager
- Launch the NodeManager:
bin/yarn-giraffa-daemon.sh start nodemanager
. You should have the following processes running: NameNode, DataNode, HQuorumPeer, HMaster, HRegionServer, ResourceManager, NodeManager.
- Run TeraGen:
bin/yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar teragen 10000000 /teragen
- Run TeraSort:
bin/yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar terasort /teragen /terasort
- Run TeraValidate:
bin/yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar teravalidate /terasort /teravalidate
- In conf/mapred-site.xml, mapreduce.terasort.simplepartitioner is set to true. This is a configuration specific to the examples jar that ensures the distributed cache is not used. You should make sure that your jobs do not use the distributed cache as it requires currently unsupported features from Giraffa.
- In conf/yarn-site.xml, yarn.application.classpath is set to the default value, with the addition of $GIRAFFA_CLASSPATH. This ensures that Yarn jobs run with a class path that includes Giraffa. Do not remove $GIRAFFA_CLASSPATH from here.
- Navigate to your HDFS installation directory. Below, replace NAMENODE_HOST with the hostname of the NameNode and replace DATANODEX_HOST with the hostname of DataNode X.
- On all nodes, modify etc/hadoop/core-site.xml:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://NAMENODE_HOST:9000</value>
</property>
</configuration>
- On NameNode, modify etc/hadoop/slaves:
DATANODE1_HOST
DATANODE2_HOST
DATANODE3_HOST
...
- On NameNode, format NameNode:
bin/hdfs namenode -format
- On NameNode, start HDFS services:
sbin/start-dfs.sh
- Navigate to your HBase installation directory. Below, replace NAMENODE_HOST with the hostname of the NameNode, MASTER_HOST with the hostname of HBase Master, and REGIONSERVERX_HOST with the hostname of RegionServer X.
- On all nodes, modify conf/hbase-site.xml:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://NAMENODE_HOST:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value> (set to 'false' to run in pseudo-distributed mode, i.e. with only one host)
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>NAMENODE_HOST</value>
</property>
</configuration>
- On all nodes, create conf/core-site.xml:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>
- On all nodes, create conf/hdfs-site.xml:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>
- On Master, modify conf/regionservers:
REGIONSERVER1_HOST
REGIONSERVER2_HOST
REGIONSERVER3_HOST
...
- On Master, start HBase services:
bin/start-hbase.sh
- Running the command
bin/start-giraffa.sh
will create a 1 node Giraffa cluster. It starts up NameNode, DataNode, and then HBase, which starts a RegionServer, Master, and ZooKeeper. Runbin/giraffa namenode -format
first to format the NameNode.