Detailed Installation and Usage Guide - rapidminer-labs/rapidminer-go-tabpy-integration GitHub Wiki

Using RapidMiner GO with Tableau Products

These are the detailed steps for installation and using RapidMiner's Go Engine for AutoModel and Tableau Prep and Tableau Studio and Server. The common steps for using the solution are as below, followed by examples of integration with Tableau Desktop, Tableau Prep

1. Prerequisite

Install rapidminer-go-python package by running the following command in conda or your preferred python environment
pip install rapidminer-go-python
Install tabpy by following instructions here https://github.com/tableau/tabpy

2. Setup functions on TabPy Server

Download the TableauDeploy.py file

This file has the necessary functions that will deployed on the tableau tabpy server. Typically no changes will be required to this file. However it will be useful to understand the structure of this if users would want to modify its behaviour.
Run the TableauDeploy.py file from using following command
```
 python <path/to/TableauDeploy.py>
```
If this is successful now you should be able to start using the functions from Tableau Studio and Tableau Data Prep.
Now that you have necessary functions installed on TabPy Server. You can start using them in a analysis or a flow by using these steps

Tableau Prep

From this github project, either clone the project and download a complete zip file or you can download individual files as mentioned here.

Download the following controller template files from this project. Depending on what you are trying to do, you may not need all of the following methods

 a. QuickTrainingTemplate.py
 This file is helpful when building predictive models from TableauPrep and Tableau desktop, with minimal settings

 b. ScoreControllerTemplate.py

 Used when getting predictions from a model that is deployed on RapidMiner GO.
 These predictions are then consumed in Tableau Prep, Tableau Desktop and  Server
 
 c. TrainControllerTemplate.py
 Used when building predictive models from TableauPrep and Tableau desktop, with more configuration and control

Create a new folder for a project

e.g if you are trying to build a model for CreditRisk or Churn or Maintenance, create a separate folder for each of those. Copy the template files from the step above into the newly created folder.
Modify QuickTrainingTemplate.py

You will modify this file only if you plan to train a model using go engine and do not want to provide advanced settings, but use default system settings.

a. Rename the file according to the project, e.g QuickTrainingChurn.py or QuickTrainingPredictiveMaintenace.py

b. Change the following variables in the code based on your setup


tabpy_serverurl = 'http://localhost:9004/'

go_url = 'https://go.rapidminer.com'
go_username = 'myusername'
go_password =  'mypassword'

#values to be changed based on what is your target label
label = 'Survived'
analysis_name = 'Titanic'

Modify ScoreControllerTemplate.py

You will modify this file only if you plan to get predictions from GO for a model that is already deployed on the GO Engine

a. Rename the file according to the project, e.g ScoreControllerChurn.py or ScoreControllerPredMaintennce.py

b. Change the following variables in the code based on your setup

c. Modify the get_output_schema() function in the file to match the columns you are expecting back.


#Change the url to where your TabPy Server is running
tabclient = Client('http://localhost:9004/')

#Url of RapidMiner go Instance
go_url = 'https://go.rapidminer.com'

#Provide your username and password to RapidMiner go below
go_username = 'myusername'
go_password =  'mypassword'

#Deployment ID of the model deployed on RapidMiner. typically a GUID like string
depID = 'abcdef-ge-g-g-g-'

#ColumnName of the target variable
label = ''

Modify TrainControllerTemplate.py

You will modify this file only if you plan to train a model using RapidMiner go engine and want to provide advanced settings, but use default system settings.

a. Rename the file according to the project, e.g TrainingChurn.py or TrainingPredictiveMaintenace.py

b. Change the following variables in the code based on your setup


# Change the following values
tabpy_serverurl = 'http://localhost:9004/'
go_url = 'https://go.rapidminer.com'
go_username = ''
go_password = ''

#values to be changed based on data
label = 'Survived'
file_name = 'Titanic'
cost_matrix =[1,-1],[-1,1](/rapidminer-labs/rapidminer-go-tabpy-integration/wiki/1,-1],[-1,1)
high_value = 'Yes'
low_value = 'No'
#### possible values 
selection_criteria = 'performance_accuracy'
max_min_crietria_selector = 'max' #or 'min'
should_deploy = True
platform = 'tabprep'

Using in workflow

a. Scripts node are basically configured now to the above controller files. When using

b. Point the script node to the appropriate Controller file with updated values

c. Connect to the same TabPy Server where you have installed.

d. Run the flows

You will find an example of training Survival in the titanic dataset and then scoring the same dataset and getting output results.

Tableau Studio

From this github project, either clone the project and download a complete zip file or you can download individual files as mentioned here.

Download the following files from this project.

 a. Titanic Example.twb
 This file holds the examples of calculated fields which has scripts to train and score the provided data

Create a new folder for a project

e.g if you are trying to build a model for CreditRisk or Churn or Maintenance, create a separate folder for each of those. Copy the template files from the step above into the newly created folder.
Modifying the Data Source

Remove the existing connections and add your new training data as connection.
Modify QuickTrain calculated filed provided in the Titanic Example.twb

You will modify this calculated filed only if you plan to train a model using go engine and do not want to provide advanced settings, but use default system settings.

a. Rename the calculated field and the file name according to the project, e.g QuickTrainingChurn.twb or QuickTrainingPredictiveMaintenace.twb

b. Change the following variables in the calculated filed script


tabpy_serverurl = 'http://localhost:9004/'

go_url = 'https://go.rapidminer.com'
go_username = 'myusername'
go_password =  'mypassword'

#values to be changed based on what is your target label
label = 'Survived'
analysis_name = 'Titanic'

c. Remove the existing attributes mentioned in the script and add the new attributes which are available in the training dataset you have just added

Survived = _arg1
Age = _arg2
Passenger = _arg3
Sex = _arg4
Siblings = _arg5
Parents = _arg6
Fair = _arg7

, ATTR([Survived]),ATTR([Age]),ATTR([Passenger]),ATTR([Sex]),ATTR([Siblings]),ATTR([Parents]),ATTR([Fair]))

d. Modify the dataframe's attributes with the attributes you have added in the above step

data = pd.DataFrame(
    {'Survived': Survived, 'Age': Age, 'Passenger': Passenger, 'Sex': Sex, 'Siblings': Siblings, 'Parents': Parents,
     'Fair': Fair})
data['Age'] = data['Age'].astype('float64')
data['Siblings'] = data['Siblings'].astype('float64')
data['Parents'] = data['Parents'].astype('float64')
data['Fair'] = data['Fair'].astype('float64')
data = data.dropna(subset=['Survived'])

Modify ScoreDeploy

You will modify this file only if you plan to get predictions from GO for a model that is already deployed on the GO Engine

a. Rename the calculated field and the file according to the project, e.g ScoreControllerChurn.twb or ScoreControllerPredMaintennce.twb

b. Change the following variables in the code based on your setup


#Change the url to where your TabPy Server is running
tabclient = Client('http://localhost:9004/')

#Url of RapidMiner go Instance
go_url = 'https://go.rapidminer.com'

#Provide your username and password to RapidMiner go below
go_username = 'myusername'
go_password =  'mypassword'

#Deployment ID of the model deployed on RapidMiner. typically a GUID like string
depID = 'abcdef-ge-g-g-g-'

#ColumnName of the target variable
label = ''

c. Remove the existing attributes mentioned in the script and add the new attributes which are available in the training dataset you have just added

Survived = _arg1
Age = _arg2
Passenger = _arg3
Sex = _arg4
Siblings = _arg5
Parents = _arg6
Fair = _arg7

, ATTR([Survived]),ATTR([Age]),ATTR([Passenger]),ATTR([Sex]),ATTR([Siblings]),ATTR([Parents]),ATTR([Fair]))

d. Modify the dataframe's attributes with the attributes you have added in the above step

test = pd.DataFrame(
    {'Survived': Survived, 'Age': Age, 'Passenger': Passenger, 'Sex': Sex, 'Siblings': Siblings, 'Parents': Parents,
     'Fair': Fair})
test['Age'] = test['Age'].astype('float64')
test['Siblings'] = test['Siblings'].astype('float64')
test['Parents'] = test['Parents'].astype('float64')
test['Fair'] = test['Fair'].astype('float64')
test = test[pd.isnull(test['Survived'])]

Modify TrainControllerTemplate.py

You will modify this file only if you plan to train a model using RapidMiner go engine and want to provide advanced settings, but use default system settings.

a. Rename the calculated field and the file according to the project, e.g TrainingChurn.twb or TrainingPredictiveMaintenace.twb

b. Change the following variables in the code based on your setup


# Change the following values
tabpy_serverurl = 'http://localhost:9004/'
go_url = 'https://go.rapidminer.com'
go_username = ''
go_password = ''

#values to be changed based on data
label = 'Survived'
file_name = 'Titanic'
cost_matrix =[1,-1],[-1,1](/rapidminer-labs/rapidminer-go-tabpy-integration/wiki/1,-1],[-1,1)
high_value = 'Yes'
low_value = 'No'
#### possible values 
selection_criteria = 'performance_accuracy'
max_min_crietria_selector = 'max' #or 'min'
should_deploy = True
platform = 'tabclient'

c. Remove the existing attributes mentioned in the script and add the new attributes which are available in the training dataset you have just added

Survived = _arg1
Age = _arg2
Passenger = _arg3
Sex = _arg4
Siblings = _arg5
Parents = _arg6
Fair = _arg7

, ATTR([Survived]),ATTR([Age]),ATTR([Passenger]),ATTR([Sex]),ATTR([Siblings]),ATTR([Parents]),ATTR([Fair]))

d. Modify the dataframe's attributes with the attributes you have added in the above step

data = pd.DataFrame(
    {'Survived': Survived, 'Age': Age, 'Passenger': Passenger, 'Sex': Sex, 'Siblings': Siblings, 'Parents': Parents,
     'Fair': Fair})
data['Age'] = data['Age'].astype('float64')
data['Siblings'] = data['Siblings'].astype('float64')
data['Parents'] = data['Parents'].astype('float64')
data['Fair'] = data['Fair'].astype('float64')
data = data.dropna(subset=['Survived'])

Using in workflow

a. Scripts node are basically configured now to the above files. When using

b. Connect to the same TabPy Server where you have installed.

c. Run the process

You will find an example of training Survival in the titanic dataset and then scoring the same dataset and getting output results.