Create Index - mitdbg/amoeba GitHub Wiki
Scenario 1: Input files are partitioned and distributed across the different machines. On each machine ensure that they are in the same directory on each machine. Check scripts/fabfile.py
. Adapt the code to point to the right directories. Run the following 3 commands.
fab bulk_sample_gen
fab create_robust_tree
fab write_partitions
Scenario 2: Input files are in HDFS. In this case, use the spark shell to sample the data and write to a filename sample
. Then run:
fab create_robust_tree
Writing out partitions by reading files from HDFS is currently unimplemented.