Hadoop Workload - absalon-james/cloud-workloads GitHub Wiki

Runs teragen and terasort on a Hadoop cluster.

Software

Roles

Roles States Anti-States Description
hadoop_master hadoop.hdfs, hadoop.mapred hadoop.antihadoop Runs namenode and jobtracker. Required
hadoop_slave hadoop.hdfs, hadoop.mapred hadoop.antihadoop Runs datanode and tasktracker. Requires at least one

Configuration

Property Default Description
terasort_size 5000000 Number of 100 byte rows to sort

Example Configuration

standard_hadoop:
  workload: hadoop
  terasort_size: 5000000
  instances:
    - roles:
        - hadoop_master
    - roles:
        - hadoop_slave
    - roles:
        - hadoop_slave
    - roles:
        - hadoop_slave
    - roles:
        - hadoop_slave
    - roles:
        - hadoop_slave
    - roles:
        - hadoop_slave

The above configuration will have 1 master and 6 slaves.

Requirements

  • 1 hadoop_master instance
  • At least 1 hadoop_slave instance
⚠️ **GitHub.com Fallback** ⚠️