How to Setup, Build, and Use Giraffa - GiraffaFS/giraffa GitHub Wiki

How to setup Giraffa build environment

1. (Optional) Download and install Gradle 2.5 or later:

wget https://services.gradle.org/distributions/gradle-2.5-all.zip
unzip gradle-2.5-all.zip
sudo mv gradle-2.5 /usr/local
sudo ln -s /usr/local/gradle-2.5/ /usr/local/gradle

Configure ∼/.bashrc, make sure that you have the following section in this file:

GRADLE_HOME=/usr/local/gradle
export PATH=$JAVA_HOME/bin:$GRADLE_HOME/bin:$PATH

Check that Gradle is correctly setup

gradle -version
> ------------------------------------------------------------
> Gradle 2.5
> ------------------------------------------------------------

> Build time:   2015-07-08 07:38:37 UTC
> Build number: none
> Revision:     093765bccd3ee722ed5310583e5ed140688a8c2b

> Groovy:       2.3.10
> Ant:          Apache Ant(TM) version 1.9.3 compiled on December 23 2013
> JVM:          1.7.0_09 (Oracle Corporation 23.5-b02)
> OS:           Mac OS X 10.8.5 x86_64

2. Download and install Google Protocol Buffers 2.5.0

Get the file from:
https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
Refer installation guide from [Protobuf Git] (https://github.com/google/protobuf)
Please note the default install location may not working to Mac users. In that case, users need follow README's instruction to change install location.
Check that Protobuf compiler is correctly setup

protoc --version
libprotoc 2.5.0

3 Get codes from Git repository

Using Git, clone our repository:

git clone https://github.com/GiraffaFS/giraffa.git

Check out trunk:

git checkout trunk

Giraffa build options

Giraffa is built using Gradle as a build tool. Main build.gradle file is located in the giraffa directory. Here's a list of different options:

  • Build Giraffa distro and run all the tests:
    ./gradlew clean build tar

    Note: by default all test output is redirected to files under build/test-reports. If you want tests to output to console, then edit build.gradle file and set redirectTestOutputToFile=false, or set it during your gradle command execution.
    Output: The installation will generate the folder giraffa/giraffa-standalone/build/distributions. Inside there will be a tar.gz distribution of Giraffa as a standalone demo. Untar this file anywhere in your system and follow the Giraffa Setup instructions below to configure and run the client.
  • Build Giraffa without tests
    ./gradlew clean build -x test
  • Build Giraffa jars
    ./gradlew clean assemble -x test
  • Generate Eclipse projects
    ./gradlew eclipse -x test
  • Build Giraffa with code coverage
    ./gradlew clean build jacocoTestReport
  • Build Giraffa for standalone demo
    ./gradlew clean assemble tar
  • Run Giraffa WebUI demo (port 40010)
    ./gradlew -PmainClass=org.apache.giraffa.web.GiraffaWebDemoRunner execute

Giraffa Setup

This section describes single-node Giraffa cluster setup.
Requirements. See also Requirements section.

Giraffa runs on top of HBase and HDFS. Thus, you should first set up your cluster to run HBase with HDFS as you normally would. See below for instructions on doing this:

  1. Unpack Hadoop, HBase, and Giraffa into hadoop/, hbase/, and giraffa/ directories, respectively.
    These location will be referred to as HADOOP_HOME, HBASE_HOME, and GIRAFFA_HOME.
    • Set variable
      export JAVA_HOME = /usr/java/default/
  2. Configure Hadoop:
    • In $HADOOP_HOME/etc/hadoop/core-site.xml set
      fs.defaultFS = hdfs://localhost:8020
    • In $HADOOP_HOME/etc/hadoop/hdfs-site.xml set
      dfs.namenode.name.dir = file:///.../data/hdfs/name
      dfs.datanode.data.dir = file:///.../data/hdfs/data
  3. Configure HBase:
    • In hbase/conf/hbase-site.xml set
      hbase.rootdir = hdfs://localhost:8020/hbase
      hbase.tmp.dir = data/hbase
      hbase.cluster.distributed = false
      hbase.coprocessor.master.classes = org.apache.giraffa.web.GiraffaWebObserver
      hbase.zookeeper.quorum = localhost
    • Copy $GIRAFFA_HOME/conf/giraffa-default.xml into $HBASE_HOME/conf/
    • Copy $GIRAFFA_HOME/conf/core-site.xml under a new name $HBASE_HOME/conf/giraffa-site.xml
      giraffa-site.xml is used by Giraffa WebUI, which is hosted by HBase
  4. Configure Giraffa:
    • In $GIRAFFA_HOME/conf/conf/giraffa-env.sh set variables
      export HADOOP_HOME=".../hadoop/"
      export HBASE_HOME=".../hbase/"
      export GIRAFFA_ROOT_LOGGER=INFO,RFA
  5. Format HDFS NameNode: bin/giraffa namenode -format
  6. Start Giraffa in pseudo-distributed mode: bin/start-giraffa.sh. You should see the following processes running: NameNode, DataNode, HMaster.
  7. Format Giraffa: bin/giraffa format.
  8. Run Giraffa CLI commands: bin/giraffa fs -ls /
    Giraffa CLI commands run identical to the Hadoop CLI commandshadoop fs -[op], and are used to access data in Giraffa file system.
  9. Check Giraffa WebUI at http://localhost:40010

YARN Setup

YARN is configured and run the usual way with the exception that configuration files are placed in the Giraffa configuration directory and wrappers are used to launch the daemons and client: yarn-giraffa-daemon.sh launches yarn-daemon.sh and yarn-giraffa launches yarn.
  • Use bin/yarn-giraffa-daemon.sh to start the YARN daemons:
    1. Launch the ResourceManager: bin/yarn-giraffa-daemon.sh start resourcemanager
    2. Launch the NodeManager: bin/yarn-giraffa-daemon.sh start nodemanager. You should have the following processes running: NameNode, DataNode, HQuorumPeer, HMaster, HRegionServer, ResourceManager, NodeManager.
  • Use bin/yarn-giraffa to run the YARN client:
    1. Run TeraGen:
      bin/yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar teragen 10000000 /teragen
    2. Run TeraSort:
      bin/yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar terasort /teragen /terasort
    3. Run TeraValidate:
      bin/yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar teravalidate /terasort /teravalidate
  • Additional Notes:
    • In conf/mapred-site.xml, mapreduce.terasort.simplepartitioner is set to true. This is a configuration specific to the examples jar that ensures the distributed cache is not used. You should make sure that your jobs do not use the distributed cache as it requires currently unsupported features from Giraffa.
    • In conf/yarn-site.xml, yarn.application.classpath is set to the default value, with the addition of $GIRAFFA_CLASSPATH. This ensures that Yarn jobs run with a class path that includes Giraffa. Do not remove $GIRAFFA_CLASSPATH from here.
  • HDFS Setup

    1. Navigate to your HDFS installation directory. Below, replace NAMENODE_HOST with the hostname of the NameNode and replace DATANODEX_HOST with the hostname of DataNode X.
    2. On all nodes, modify etc/hadoop/core-site.xml:
      <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
      <property>
      <name>fs.defaultFS</name>
      <value>hdfs://NAMENODE_HOST:9000</value>
      </property>
      </configuration>
    3. On NameNode, modify etc/hadoop/slaves:
      DATANODE1_HOST
      DATANODE2_HOST
      DATANODE3_HOST
      ...
    4. On NameNode, format NameNode:
      bin/hdfs namenode -format
    5. On NameNode, start HDFS services:
      sbin/start-dfs.sh

    HBase Setup

    1. Navigate to your HBase installation directory. Below, replace NAMENODE_HOST with the hostname of the NameNode, MASTER_HOST with the hostname of HBase Master, and REGIONSERVERX_HOST with the hostname of RegionServer X.
    2. On all nodes, modify conf/hbase-site.xml:
      <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
      <property>
      <name>hbase.rootdir</name>
      <value>hdfs://NAMENODE_HOST:9000/hbase</value>
      </property>
      <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value> (set to 'false' to run in pseudo-distributed mode, i.e. with only one host)
      </property>
      <property>
      <name>hbase.zookeeper.quorum</name>
      <value>NAMENODE_HOST</value>
      </property>
      </configuration>
    3. On all nodes, create conf/core-site.xml:
      <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
      </configuration>
    4. On all nodes, create conf/hdfs-site.xml:
      <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
      </configuration>
    5. On Master, modify conf/regionservers:
      REGIONSERVER1_HOST
      REGIONSERVER2_HOST
      REGIONSERVER3_HOST
      ...
    6. On Master, start HBase services:
      bin/start-hbase.sh

    Misc. Notes

    • Running the command bin/start-giraffa.sh will create a 1 node Giraffa cluster. It starts up NameNode, DataNode, and then HBase, which starts a RegionServer, Master, and ZooKeeper. Run bin/giraffa namenode -format first to format the NameNode.
    ⚠️ **GitHub.com Fallback** ⚠️