Example Use Cases - rapidminer-labs/rapidminer-go-tabpy-integration GitHub Wiki
Example of Titanic data Survival Prediction Using RapidMiner GO with Tableau Products
After cloning rapidminer-go-tabpy-integration project from the GitHub, you can find three folders
- Tableau Prep
- Tableau Studio
- Tableau Template
As the name suggests the Tableau Prep and Tableau Studio contains examples of integrating Tableau prep and Tableau studio with the RapidMiner GO product, while the Tableau Template holds the basic structure which you can modify and use as per your requirement
Tableau Prep
An example of a Titanic dataset along with a Survival prediction script is provided. There are two parts to the Prediction process:
- Training the dataset
- Scoring the dataset (Predicting)
Start your tabpy server in order to perform the Prediction process
Training
The Example_Titanic_Training.tfl is a Tableau Prep file that owns the flow of the Training process. The flow typically contains three steps. DataFile->Script->Output.
Load the TitanicTraining.csv provided in the same folder as DataFile.
Load the TitanicTrainController.py provided in the same folder as ScriptFile.
Provide the path and name of the output folder as you desire.
The TitanicTrainController.py is a controller file that directs the Tableau Prep flow to call a tabpy server deployed training function as below
tabclient = Client(tabpy_serverurl)
# dataframe to json
input_data = json.loads(data.to_json(orient='records'))
returnResult = tabclient.query('RapidMiner_Train', go_url, go_username, go_password, analysis_name, input_data, label,cost_matrix,high_value,low_value,selection_criteria, max_min_crietria_selector, platform)
The 'RapidMiner_Train' mentioned in the python script above is a deployed function in tabpy server which is mentioned in the Tableau Template folder under TableauDeploy.py file.
The training can be also done by the following way as mentioned in the controller file TitanicQuickTraining.py where the Example_Titanic_Quick_Training.tfl file holds the flow
responseJSON = training_data.to_json(orient='records')
input_data = json.loads(responseJSON)
returnResult = tabclient.query('Rapidminer_Quick_Training', go_url, go_username, go_password, analysis_name, input_data, label,selection_criteria,max_min_crietria_selector,'tabprep')
The 'Rapidminer_Quick_Training' mentioned in the python script above is a deployed function in tabpy server that is mentioned in the Tableau Template folder under TableauDeploy.py file.
The output from the Controller script to the Tableau Prep is passed on as a DataFrame in order to display the result in a Table format in Tableau Prep
The output schema is to be set in the Controller script for the Tableau Prep to understand the schema while displaying the result as mentioned below
def get_output_schema():
return pd.DataFrame({
MODELING_ID: prep_string(),
STATUS: prep_string(),
DEPLOYMENT_ID: prep_string(),
MODEL: prep_string(),
URL: prep_string()
})
Scoring
Example_Titanic_Scoring.tfl is a Tableau Prep file that owns the flow of the Scoring process. The flow typically contains three steps. DataFile->Script->Output.
Load the Data file, Script, and the Output path.
The TitanicScoreController.py is a controller file that directs the Tableau Prep flow to call a tabpy server deployed scoring function as below
# dataframe to json
input_data = json.loads(test.to_json(orient='records'))
returnResult = tabclient.query('RapidMiner_Score',go_url, go_username , go_password, input_data, label, deployment_ID)
The 'RapidMiner_Score' mentioned in the python script above is a deployed function in tabpy server, which is mentioned in the Tableau Template folder under TableauDeploy.py file.
The output schema is to be set in the TitanicScoreController.py script for the Tableau Prep to understand the schema while displaying the result as mentioned below
def get_output_schema():
return pd.DataFrame({
'Row No.' : prep_decimal(),
'Age': prep_decimal(),
'Passenger': prep_string(),
'Sex': prep_string(),
'Siblings' : prep_decimal(),
'Parents' : prep_decimal(),
'Fair' : prep_decimal(),
PREDICTION: prep_string()
})
Tableau Studio
Like Tableau prep, Tableau Studio can also be used to integrate with the RapidMiner GO server, where the returned result of prediction data from the Rapidminer GO server can be plotted into different graphs. There are two parts to the Prediction process:
- Training the dataset
- Scoring the dataset (Predicting)
The Titanic Example.twb file holds calculated fields as examples for Training and Survival Prediction of the Titanic Dataset.
Start your tabpy server and connect your Tableau Studio with the tabpy server in order to perform the Prediction process.
Training
The TrainDeployed is a calculated field, which contains the script to call the tabpy deployed training function as mentioned below
trainJsonConversion = traindataframe.to_json(orient='records')
trainJsonValue = json.loads(trainJsonConversion)
trainLableList = tabclient.query('RapidMiner_Train', go_url, go_username, go_password, analysis_name, trainJsonValue, label,cost_matrix,high_value,low_value,selection_criteria, max_min_crietria_selector, platform)
The 'RapidMiner_Train' mentioned in the python script above is a deployed function in the tabpy server, which is mentioned in the Tableau Template folder under TableauDeploy.py file.
The training can be also done in a quicker way where the Rapidminer GO server takes care of the machine learning features and deploy the best model.
responseJSON = training_data.to_json(orient='records')
input_data = json.loads(responseJSON)
returnResult = tabclient.query('Rapidminer_Quick_Training', go_url, go_username, go_password, analysis_name, input_data, label,selection_criteria,max_min_crietria_selector,'tabprep')
The 'Rapidminer_Quick_Training' mentioned in the python script above is a deployed function in the tabpy server, which is mentioned in the Tableau Template folder under TableauDeploy.py file.
Scoring
The ScoreDeploy is a calculated field, which contains the script to call the tabpy deployed training function as mentioned below
# dataframe to json
jd = test.to_json(orient='records')
jsonDataOfInputTest = json.loads(jd)
predictedList = tabclient.query('RapidMiner_Score',go_url, go_username, go_password, jsonDataOfInputTest, label, deployment_ID)
The 'RapidMiner_Score' mentioned in the python script above is a deployed function in the tabpy server, which is mentioned in the Tableau Template folder under TableauDeploy.py file.
Tableau Template
TableauDeploy.py is the script that holds the functions to Train and Score by making API calls to the RapidMiner GO server using the rapidminer-go-python package functions.
Quick Train function in TabpyDeploy.py
def rapidminer_quick_training(go_url, gouser, gopassword, analysis_name, input_data, label,selection_criteria,max_min_crietria_selector, platform):
from rapidminer_go_python import rapidminergoclient as amw
# To get the AMW instance
client = amw.RapidMinerGoClient(go_url, gouser, gopassword)
data = client.convert_json_to_dataframe(input_data)
trainingResult = client.quick_automodel(input_data,label,AUTODEPLOY,selection_criteria,max_min_crietria_selector, analysis_name)
Deploy script in TabpyDeploy.py
tabclient.deploy('Rapidminer_Quick_Training', rapidminer_quick_training, 'Quickly Trains a model for predictions', override=True)
The above script deploys the rapidminer_quick_training function to the tabpy server, which can be accessed by the Tableau Prep Controller scripts and Tableau Studio using the 'Rapidminer_Quick_Training' key.
To stop the RapidMiner GO server from auto-deploying the best model pass 'AUTODEPLOY = False'.
Similar functions for Training and Scoring are added and deployed to the tabpy server in order to be accessed from Tableau Studio and Tableau Prep.
*Please note that you have to keep tabpy server up and running when you are Deploying and calling the functions from the Tableau products.