Hadoop Workload - absalon-james/cloud-workloads GitHub Wiki
Runs teragen and terasort on a Hadoop cluster.
Software
- Hadoop
- Teragen
- Terasort
- Modified version of the Saltstack Hadoop formula
- Modified version of the Saltstack Hostsfile formula
- Saltstack NTP formula
- OpenJDK 7
Roles
Roles | States | Anti-States | Description |
---|---|---|---|
hadoop_master | hadoop.hdfs, hadoop.mapred | hadoop.antihadoop | Runs namenode and jobtracker. Required |
hadoop_slave | hadoop.hdfs, hadoop.mapred | hadoop.antihadoop | Runs datanode and tasktracker. Requires at least one |
Configuration
Property | Default | Description |
---|---|---|
terasort_size | 5000000 | Number of 100 byte rows to sort |
Example Configuration
standard_hadoop:
workload: hadoop
terasort_size: 5000000
instances:
- roles:
- hadoop_master
- roles:
- hadoop_slave
- roles:
- hadoop_slave
- roles:
- hadoop_slave
- roles:
- hadoop_slave
- roles:
- hadoop_slave
- roles:
- hadoop_slave
The above configuration will have 1 master and 6 slaves.
Requirements
- 1 hadoop_master instance
- At least 1 hadoop_slave instance