How to Setup, Build, and Use Giraffa 0.0.1 (legacy) - GiraffaFS/giraffa GitHub Wiki
- Download and install Maven:
wget http://apache.mesi.com.ar/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-in.tar.gz
tar -zxvf apache-maven-3.0.4-bin.tar.gz
sudo mv apache-maven-3.0.4 /usr/local
sudo mv apache-maven-3.0.4 /usr/local
sudo ln -s /usr/local/apache-maven-3.0.4/ /usr/local/maven
- Configure ∼/.bashrc, make sure that you have the following section in this file:
M2_HOME=/usr/local/maven
export M2=$M2_HOME/bin
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512m"
export PATH=$JAVA_HOME/bin:$M2_HOME/bin:$PATH
- Check that Maven is correctly setup
mvn -version
Apache Maven 3.0.4 (r1232337; 2012-01-17 00:44:56-0800)
- For further instructions refer to "Installation Instructions" on http://maven.apache.org/download.cgi
- Using Git, clone our repository: git clone https://code.google.com/a/apache-extras.org/p/giraffa/
- Check out trunk: git checkout trunk
Giraffa is using using Maven as a build tool. Main pom.xml file is located in the giraffa directory. Here's a list of different options:
- Build Giraffa and run all the tests:mvn clean install
 Note: by default all test output is redirected to files under target/surefire-reports. If you want tests to output to console, then edit pom.xml file and set redirectTestOutputToFile=false, or set it during your maven command execution.
 
- Build Giraffa without tests
 mvn clean install -DskipTests
 
- Build Giraffa Project site:
 mvn clean site
 When build is complete, you can access site at${basedir}/target/site/index.html
 
- Build Giraffa Site with Clover report:
 mvn -Pclover site
 When build is complete, you can access site at${basedir}/target/site/index.html.
 Note: You will need to place yourclover.licensefile in${user.home}/.m2/clover.license
 WARNING! Clover plugin instruments source files and and it should not be used for production!
 
In demo mode, Giraffa will start embeded Hadoop MiniCluster, Hive, Web UI. You will be able to perform all supported operations through Giraffa Web Interface:
mvn -Pwebdemo
- Navigate to http://localhost:40010.
 
- Type "stop" in maven console to stop the demo server.
- Copy hadoop-0.22.0 directory from unarchived download of Hadoop 0.22.0 to giraffa-standalone/target/giraffa-standalone/ directory and rename it to just "hadoop".
 
- Copy hbase-0.94.1 directory from unarchived download of HBase 0.94.1 to giraffa-standalone/target/giraffa-standalone/ directory and rename it to just "hbase".
 
- (The rest of the instructions assume the current directory is now giraffa-standalone/target/giraffa-standalone/)
 
- Remove hadoop-core-*.jar from hbase/lib and copy hadoop/hadoop-*.jar files into hbase/lib.
 
- Copy giraffa/lib/giraffa-standalone-VERSION-SNAPSHOT.jar to hbase/lib.
 
- In hbase/conf, create an empty hdfs-site.xml and core-site.xml: <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration></configuration>
- In hadoop/conf, modify hdfs-site.xml: <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></property></configuration>
- In hbase/conf, modify hbase-site.xml: <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>hbase.rootdir</name><value>hdfs://localhost:9000/hbase</value></property><property><name>hbase.coprocessor.master.classes</name><value>org.apache.giraffa.web.GiraffaWebObserver</value></property></configuration>
- Make sure environment variables HADOOP_HOME, HADOOP_COMMON_HOME, and HBASE_HOME are not set.
 
- Do giraffa/bin/giraffa namenode -formatcommand first, so NameNode and DataNode start up properly. If this is a re-attempt you want to delete all your /tmp/hadoop and /tmp/hbase directories and files.
 
- Do giraffa/bin/start-giraffa.shcommand.
 
- Do giraffa/bin/giraffa formatcommand to format Giraffa.
 
- Do any giraffa/bin/giraffa fs -[op]command to create and access files in Giraffa, the same way as thehadoop fs -[op]command is used to access HDFS data.
 
- (Optional) Run TestBlockManagement from Eclipse, which executes TestBlockManagement.main(). This will write and read file(s).
 
- Use giraffa/bin/stop-giraffa.shto stop the Giraffa cluster.
- 
NOTES: This will set up a multinode Giraffa cluster by configuring the HDFS servers (NameNode and DataNodes), HBase servers (Master and RegionServers), and Giraffa Clients. You must know the hostnames of the nodes hosting these components, although they do not necessarily have to be unique. For example, in the Standalone Cluster, every component is hosted on the same node and therefore has the same hostname. However, there are restrictions: every component must be on the same LAN, there may be only one NameNode on the cluster, and there may be only one DataNode, RegionServer, and Master on a single node. In the following instructions, replace NAMENODE with the hostname of the node hosting the NameNode.
 
- 
PREREQUISITES: Follow steps 1 through 9 in "How to run Giraffa Standalone Cluster (Single-Node)" for every HDFS server, HBase server, and Giraffa Client. Ensure that giraffais installed at the same location on each server. The rest of the instructions assume the current directory on each node is giraffa-standalone/target/giraffa-standalone.
 
- 
CONFIGURATION: The instructions below specify configuration files and map property names to values. These should be added or changed inside the <configuration></configuration>block of the files using the format:<property><name>NAME</name><value>VALUE</value></property>- HDFS Configuration: On every NameNode and DataNode:
 - hadoop/conf/core-site.xml: fs.defaultFS => hdfs://NAMENODE:9000
 
- hadoop/conf/hdfs-site.xml: fs.defaultFS => hdfs://NAMENODE:9000
 
 
- hadoop/conf/core-site.xml: fs.defaultFS => hdfs://NAMENODE:9000
- HDFS Configuration: On NameNode only:
 - hadoop/conf/masters: (should contain just one line that says: NAMENODE)
 
- hadoop/conf/slaves: (list each DataNode hostname, one per line)
 
 
- hadoop/conf/masters: (should contain just one line that says: NAMENODE)
- HBase Configuration: On every Master and RegionServer:
 - hbase/conf/hbase-site.xml:
 - 
hbase.rootdir => hdfs://NAMENODE:9000/hbase
- 
hbase.cluster.distributed => true
- 
hbase.zookeeper.quorum => NAMENODE
 
- 
 
- hbase/conf/hbase-site.xml:
- HBase Configuration: On Master only:
 - hbase/conf/regionservers: (list each RegionServer hostname, one per line)
 
 
- hbase/conf/regionservers: (list each RegionServer hostname, one per line)
- Giraffa Configuration: On every Giraffa Client:
 - giraffa/conf/core-site.xml:
 - 
hbase.rootdir => hdfs://NAMENODE:9000/hbase
- 
hbase.coprocessor.master.classes => org.apache.giraffa.web.GiraffaWebObserver
- 
hbase.cluster.distributed => true
- 
hbase.zookeeper.quorum => NAMENODE
 
- 
 
- giraffa/conf/core-site.xml:
 
- HDFS Configuration: On every NameNode and DataNode:
- 
STARTING:
- Start HDFS. Complete the following on the NameNode:
 - Run giraffa/bin/giraffa namenode -format. If this is a re-attempt, delete /tmp/hadoop and /tmp/hbase files first.
 
- Run hadoop/bin/start-dfs.sh
 
- Run 
- Start HBase: Complete the following on the Master:
 - Run hbase/bin/start-hbase.sh
 
- Run 
- Format Giraffa: Complete the following on the NameNode:
 - Run giraffa/bin/giraffa format
 
- Run 
- Verify: To check that start-up has completed successfully, run jpson each HDFS and HBase server. The NameNode should have process NameNode and SecondaryNameNode. The Master should have process HMaster. Each DataNode should have process DataNode. Each RegionServer should have process HRegionServer. The SecondaryNameNode process is not necessary for Giraffa and may be killed manually.
 
 
- Start HDFS. Complete the following on the NameNode:
- 
RUNNING: Complete the following on a Giraffa Client:
 - Do any giraffa/bin/giraffa fs -[op]command to create and access files in Giraffa, the same way as thehadoop fs -[op]command is used to access HDFS data.
 
 
- Do any 
- 
STOPPING:
- Run hadoop/bin/stop-hbase.shon the Master
 
- Run hbase/bin/stop-dfs.shon the NameNode
 
 
- Run 
- 
WEB UI: Type HOSTNAME:PORT into the browser of any machine on the LAN to access the web UI of the following components (if this does not work, replace the hostname with the IP address, or alternatively, add the hostname/ip address pairs to your hostsfile):
 - NameNode: Port 50070
 
- DataNode: Port 50075
 
- Master: Port 60010
 
- RegionServer: Port 60030
 
- NameNode: Port 50070
YARN setup in Giraffa is identical to YARN setup in HDFS, with the exception that configuration files and executables are in a different location. In giraffa/conf, notice the following files:
yarn-env.sh
yarn-site.xml
mapred-env.sh
mapred-site.xml
Edit these files as you normally would. They have been pre-configured to run Tera jobs from the mapreduce examples jar. A couple of notes:
mapreduce.terasort.simplepartitioner is set to true. This is a configuration specific to the examples jar that ensures the distributed cache is not used. You should make sure that your jobs do not use the distributed cache as it requires currently unsupported features from Giraffa.
yarn.application.classpath is set to the default value, with the addition of $GIRAFFA_CLASSPATH. This ensures that Yarn jobs run with a class path that includes Giraffa. Do not remove $GIRAFFA_CLASSPATH from here.
Also notice the following files:
capacity-scheduler.xml
configuration.xsl
container-executer.cfg
These files are identical to the ones normally found in the hadoop configuration directory. If it turns out you need any additional configuration files, drop them in this directory.
In giraffa/bin, notice the following files:
yarn-giraffa
yarn-giraffa-daemon.sh
These are the Giraffa equivalents of the yarn and yarn-daemon.sh scripts you normally use to start jobs. For example, to start the resource manager and node manager, run:
yarn-giraffa-daemon.sh start resourcemanager
yarn-giraffa-daemon.sh start nodemanager
Then to run a teragen job from the examples jar, generating 10,000,000 rows in the directory “input”
yarn-giraffa jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar teragen 10000000 input
- Running the command giraffa/bin/start-giraffa.shwill create a 1 node Giraffa cluster. It starts up NameNode, DataNode, and then HBase, which starts a RegionServer, Master, and ZooKeeper.
 
- Run hadoop/bin/hadoop-daemon.sh start namenode or hadoop/bin/hadoop-daemon start datanode to manually start a NameNode or DataNode on a server. Likewise, run hbase/bin/hbase start master or hbase/bin/hbase start RegionServer to manually start an HBase Master or Region Server.