Alluxio - animeshtrivedi/notes GitHub Wiki

Setting it up and working

Download alluxio 1.5 build for 2.7 hadoop here: http://www.alluxio.org/download

Configuration

Extract it and here is the content of the files in the conf directory

$cat alluxio-env.sh
[...]
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"flex11-40g0"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"128GB"}
ALLUXIO_RAM_FOLDER=${ALLUXIO_RAM_FOLDER:-"/mnt/tmpfs/alluxio"}
ALLUXIO_UNDERFS_ADDRESS="/mnt/tmpfs/alluxio/underFSStorage"
$cat alluxio-site.properties
# Common properties
alluxio.master.hostname=flex11-40g0
alluxio.underfs.address=/mnt/tmpfs/alluxio/underFSStorage

# Security properties
# alluxio.security.authorization.permission.enabled=true
# alluxio.security.authentication.type=SIMPLE

# Worker properties
alluxio.worker.memory.size=128GB
alluxio.worker.tieredstore.levels=1
alluxio.worker.tieredstore.level0.alias=MEM
alluxio.worker.tieredstore.level0.dirs.path=/mnt/tmpfs/alluxio/

# User properties
# alluxio.user.file.readtype.default=CACHE_PROMOTE
# alluxio.user.file.writetype.default=MUST_CACHE
$cat core-site.xml
<configuration>
<!--
  <property>
    <name>fs.defaultFS</name>
    <value>alluxio://flex11-40g0:19998</value>
  </property>
-->
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
    <description>The Alluxio FileSystem (Hadoop 1.x and 2.x)</description>
  </property>
  <property>
    <name>fs.alluxio-ft.impl</name>
    <value>alluxio.hadoop.FaultTolerantFileSystem</value>
    <description>The Alluxio FileSystem (Hadoop 1.x and 2.x) with fault tolerant support</description>
  </property>
  <property>
    <name>fs.AbstractFileSystem.alluxio.impl</name>
    <value>alluxio.hadoop.AlluxioFileSystem</value>
    <description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>
  </property>
</configuration>

Files masters contains the fault tolerant configuration. I left it as localhost. And the workers contains the hostname of workers.

Starting by hand

To start the master (-w says that wait for the process to end)

./bin/alluxio-start.sh -w master

and go to a worker to start it by hand as

./bin/alluxio-start.sh -w worker NoMount

I have it on NoMount as the /mnt/tmpfs is already mounted. Once these processes are up (that is they did not quit unexpectedly), check the logs and copy some local file for some sanity tests.

Running the cluster

When all seems normal then you can start the whole cluster as

./bin/alluxio-start.sh master 
./bin/alluxio-start.sh workers NoMount

You can browse the current file system state at: http://your_host:19999/home

Error I have this error in the browser

Inconsistent Files on Startup (run fs checkConsistency for details):	
[...]
On Startup, 1 inconsistent files were found. This check is only checked once at startup, and you can restart the Alluxio Master for the latest information. 
The following files may be corrupted:
\

As far as I can tell all seems fine. So I am ignoring this error for now.

Using it in Spark

There are a few changes to use alluxio with Spark. First you need to tell core-site.xml about alluxio. My hadoop core-site.xml now contains crail and alluxio details as

$cat core-site.xml
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://flex11-40g0:9000</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>1048576</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/mnt/tmpfs/tmp</value>
  </property>

  <property>
   <name>fs.crail.impl</name>
   <value>com.ibm.crail.hdfs.CrailHadoopFileSystem</value>
  </property>
  <property>
    <name>fs.AbstractFileSystem.crail.impl</name>
    <value>com.ibm.crail.hdfs.CrailHDFS</value>
  </property>

  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
    <description>The Alluxio FileSystem (Hadoop 1.x and 2.x)</description>
  </property>
  <property>
    <name>fs.AbstractFileSystem.alluxio.impl</name>
    <value>alluxio.hadoop.AlluxioFileSystem</value>
    <description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>
  </property>
</configuration>

and then you have to copy the jar file into Spark class path. I have extra-jars path set so I copied the file there.

cp ~/alluxio/client/default/alluxio-1.5.0-default-client.jar ./extra-jars/

or for more details follow these instructions: http://www.alluxio.org/docs/master/en/Debugging-Guide.html#usage-faq

That should be all.

⚠️ **GitHub.com Fallback** ⚠️