Sync Autotuner for Apache Spark User Guide - synccomputingcode/user_documentation GitHub Wiki

Below is a guide to help you through receiving a set of predictions from the Sync Autotuner for Apache Spark. The below assumes you have registered as a user, and been granted free trial access.

Select the Spark on EMR tile from the "Start" tab. This will bring you to the screen where you can provide input data to the Autotuner.

Add your cluster information in JSON format. If you aren't sure how to gather the cluster info, guidance is provided at: https://github.com/synccomputingcode/client_tools

Upload your Spark log file. The Autotuner supports compressed (.tar.gz, .gz, .zip, .log) and uncompressed logs. If you aren't sure how to access the correct Spark Event Logs, see guidance here: https://github.com/synccomputingcode/client_tools

Once the cluster information is added, and the log file is uploaded, the Autotuner will begin the process of creating cost and runtime predictions across various Spark configurations and AWS infrastructure. If you want, you can close your browser at this point and you will receive an email when the predictions are complete. If you receive an error at this stage, please email [email protected] for assistance.

When processing is complete, you will be forwarded to your Autotuner prediction results. To quickly select a new configuration for your job, you can pick either the Performance, Balanced, or Economy options. If you want to pick a custom configuration, each dot on the graph is a possible configuration with an associated cost and runtime. You can compare that to your current configuration, which is represented by the back dot on the graph.

Once you have selected your preferred configuration, scroll down to see the configuration details, which you can copy to your clipboard, and import into terraform, or your tool of choice to update your Spark and Cluster configuration, and re-run your job.

You can always access your previous predictions on the history page.

We would appreciate any and all feedback, in particular new log files after you re-ran your job with one of our predictions. Please feel free to email us at [email protected], or open a github issue here: https://github.com/synccomputingcode/user_documentation/issues