Detailed Installation and Usage Guide - rapidminer-labs/rapidminer-go-tabpy-integration GitHub Wiki
Using RapidMiner GO with Tableau Products
These are the detailed steps for installation and using RapidMiner's Go Engine for AutoModel and Tableau Prep and Tableau Studio and Server. The common steps for using the solution are as below, followed by examples of integration with Tableau Desktop, Tableau Prep
1. Prerequisite
- Install rapidminer-go-python package by running the following command in conda or your preferred python environment
pip install rapidminer-go-python - Install tabpy by following instructions here https://github.com/tableau/tabpy
2. Setup functions on TabPy Server
-
Download the TableauDeploy.py file
This file has the necessary functions that will deployed on the tableau tabpy server. Typically no changes will be required to this file. However it will be useful to understand the structure of this if users would want to modify its behaviour.
-
Run the TableauDeploy.py file from using following command
python <path/to/TableauDeploy.py>
If this is successful now you should be able to start using the functions from Tableau Studio and Tableau Data Prep.
Now that you have necessary functions installed on TabPy Server. You can start using them in a analysis or a flow by using these steps
Tableau Prep
From this github project, either clone the project and download a complete zip file or you can download individual files as mentioned here.
-
Download the following controller template files from this project. Depending on what you are trying to do, you may not need all of the following methods
a. QuickTrainingTemplate.py This file is helpful when building predictive models from TableauPrep and Tableau desktop, with minimal settings b. ScoreControllerTemplate.py Used when getting predictions from a model that is deployed on RapidMiner GO. These predictions are then consumed in Tableau Prep, Tableau Desktop and Server c. TrainControllerTemplate.py Used when building predictive models from TableauPrep and Tableau desktop, with more configuration and control
-
Create a new folder for a project
e.g if you are trying to build a model for CreditRisk or Churn or Maintenance, create a separate folder for each of those. Copy the template files from the step above into the newly created folder.
-
Modify QuickTrainingTemplate.py
You will modify this file only if you plan to train a model using go engine and do not want to provide advanced settings, but use default system settings.
a. Rename the file according to the project, e.g QuickTrainingChurn.py or QuickTrainingPredictiveMaintenace.py
b. Change the following variables in the code based on your setup
tabpy_serverurl = 'http://localhost:9004/'
go_url = 'https://go.rapidminer.com'
go_username = 'myusername'
go_password = 'mypassword'
#values to be changed based on what is your target label
label = 'Survived'
analysis_name = 'Titanic'
-
Modify ScoreControllerTemplate.py
You will modify this file only if you plan to get predictions from GO for a model that is already deployed on the GO Engine
a. Rename the file according to the project, e.g ScoreControllerChurn.py or ScoreControllerPredMaintennce.py
b. Change the following variables in the code based on your setup
c. Modify the get_output_schema() function in the file to match the columns you are expecting back.
#Change the url to where your TabPy Server is running
tabclient = Client('http://localhost:9004/')
#Url of RapidMiner go Instance
go_url = 'https://go.rapidminer.com'
#Provide your username and password to RapidMiner go below
go_username = 'myusername'
go_password = 'mypassword'
#Deployment ID of the model deployed on RapidMiner. typically a GUID like string
depID = 'abcdef-ge-g-g-g-'
#ColumnName of the target variable
label = ''
-
Modify TrainControllerTemplate.py
You will modify this file only if you plan to train a model using RapidMiner go engine and want to provide advanced settings, but use default system settings.
a. Rename the file according to the project, e.g TrainingChurn.py or TrainingPredictiveMaintenace.py
b. Change the following variables in the code based on your setup
# Change the following values
tabpy_serverurl = 'http://localhost:9004/'
go_url = 'https://go.rapidminer.com'
go_username = ''
go_password = ''
#values to be changed based on data
label = 'Survived'
file_name = 'Titanic'
cost_matrix =[1,-1],[-1,1](/rapidminer-labs/rapidminer-go-tabpy-integration/wiki/1,-1],[-1,1)
high_value = 'Yes'
low_value = 'No'
#### possible values
selection_criteria = 'performance_accuracy'
max_min_crietria_selector = 'max' #or 'min'
should_deploy = True
platform = 'tabprep'
-
Using in workflow
a. Scripts node are basically configured now to the above controller files. When using
b. Point the script node to the appropriate Controller file with updated values
c. Connect to the same TabPy Server where you have installed.
d. Run the flows
You will find an example of training Survival in the titanic dataset and then scoring the same dataset and getting output results.
Tableau Studio
From this github project, either clone the project and download a complete zip file or you can download individual files as mentioned here.
-
Download the following files from this project.
a. Titanic Example.twb This file holds the examples of calculated fields which has scripts to train and score the provided data
-
Create a new folder for a project
e.g if you are trying to build a model for CreditRisk or Churn or Maintenance, create a separate folder for each of those. Copy the template files from the step above into the newly created folder.
-
Modifying the Data Source
Remove the existing connections and add your new training data as connection.
-
Modify QuickTrain calculated filed provided in the Titanic Example.twb
You will modify this calculated filed only if you plan to train a model using go engine and do not want to provide advanced settings, but use default system settings.
a. Rename the calculated field and the file name according to the project, e.g QuickTrainingChurn.twb or QuickTrainingPredictiveMaintenace.twb
b. Change the following variables in the calculated filed script
tabpy_serverurl = 'http://localhost:9004/'
go_url = 'https://go.rapidminer.com'
go_username = 'myusername'
go_password = 'mypassword'
#values to be changed based on what is your target label
label = 'Survived'
analysis_name = 'Titanic'
c. Remove the existing attributes mentioned in the script and add the new attributes which are available in the training dataset you have just added
Survived = _arg1
Age = _arg2
Passenger = _arg3
Sex = _arg4
Siblings = _arg5
Parents = _arg6
Fair = _arg7
, ATTR([Survived]),ATTR([Age]),ATTR([Passenger]),ATTR([Sex]),ATTR([Siblings]),ATTR([Parents]),ATTR([Fair]))
d. Modify the dataframe's attributes with the attributes you have added in the above step
data = pd.DataFrame(
{'Survived': Survived, 'Age': Age, 'Passenger': Passenger, 'Sex': Sex, 'Siblings': Siblings, 'Parents': Parents,
'Fair': Fair})
data['Age'] = data['Age'].astype('float64')
data['Siblings'] = data['Siblings'].astype('float64')
data['Parents'] = data['Parents'].astype('float64')
data['Fair'] = data['Fair'].astype('float64')
data = data.dropna(subset=['Survived'])
-
Modify ScoreDeploy
You will modify this file only if you plan to get predictions from GO for a model that is already deployed on the GO Engine
a. Rename the calculated field and the file according to the project, e.g ScoreControllerChurn.twb or ScoreControllerPredMaintennce.twb
b. Change the following variables in the code based on your setup
#Change the url to where your TabPy Server is running
tabclient = Client('http://localhost:9004/')
#Url of RapidMiner go Instance
go_url = 'https://go.rapidminer.com'
#Provide your username and password to RapidMiner go below
go_username = 'myusername'
go_password = 'mypassword'
#Deployment ID of the model deployed on RapidMiner. typically a GUID like string
depID = 'abcdef-ge-g-g-g-'
#ColumnName of the target variable
label = ''
c. Remove the existing attributes mentioned in the script and add the new attributes which are available in the training dataset you have just added
Survived = _arg1
Age = _arg2
Passenger = _arg3
Sex = _arg4
Siblings = _arg5
Parents = _arg6
Fair = _arg7
, ATTR([Survived]),ATTR([Age]),ATTR([Passenger]),ATTR([Sex]),ATTR([Siblings]),ATTR([Parents]),ATTR([Fair]))
d. Modify the dataframe's attributes with the attributes you have added in the above step
test = pd.DataFrame(
{'Survived': Survived, 'Age': Age, 'Passenger': Passenger, 'Sex': Sex, 'Siblings': Siblings, 'Parents': Parents,
'Fair': Fair})
test['Age'] = test['Age'].astype('float64')
test['Siblings'] = test['Siblings'].astype('float64')
test['Parents'] = test['Parents'].astype('float64')
test['Fair'] = test['Fair'].astype('float64')
test = test[pd.isnull(test['Survived'])]
-
Modify TrainControllerTemplate.py
You will modify this file only if you plan to train a model using RapidMiner go engine and want to provide advanced settings, but use default system settings.
a. Rename the calculated field and the file according to the project, e.g TrainingChurn.twb or TrainingPredictiveMaintenace.twb
b. Change the following variables in the code based on your setup
# Change the following values
tabpy_serverurl = 'http://localhost:9004/'
go_url = 'https://go.rapidminer.com'
go_username = ''
go_password = ''
#values to be changed based on data
label = 'Survived'
file_name = 'Titanic'
cost_matrix =[1,-1],[-1,1](/rapidminer-labs/rapidminer-go-tabpy-integration/wiki/1,-1],[-1,1)
high_value = 'Yes'
low_value = 'No'
#### possible values
selection_criteria = 'performance_accuracy'
max_min_crietria_selector = 'max' #or 'min'
should_deploy = True
platform = 'tabclient'
c. Remove the existing attributes mentioned in the script and add the new attributes which are available in the training dataset you have just added
Survived = _arg1
Age = _arg2
Passenger = _arg3
Sex = _arg4
Siblings = _arg5
Parents = _arg6
Fair = _arg7
, ATTR([Survived]),ATTR([Age]),ATTR([Passenger]),ATTR([Sex]),ATTR([Siblings]),ATTR([Parents]),ATTR([Fair]))
d. Modify the dataframe's attributes with the attributes you have added in the above step
data = pd.DataFrame(
{'Survived': Survived, 'Age': Age, 'Passenger': Passenger, 'Sex': Sex, 'Siblings': Siblings, 'Parents': Parents,
'Fair': Fair})
data['Age'] = data['Age'].astype('float64')
data['Siblings'] = data['Siblings'].astype('float64')
data['Parents'] = data['Parents'].astype('float64')
data['Fair'] = data['Fair'].astype('float64')
data = data.dropna(subset=['Survived'])
-
Using in workflow
a. Scripts node are basically configured now to the above files. When using
b. Connect to the same TabPy Server where you have installed.
c. Run the process
You will find an example of training Survival in the titanic dataset and then scoring the same dataset and getting output results.