Bug bash instructions - RevolutionAnalytics/AzureML GitHub Wiki

Getting started

1. Installing the package (and a zip program)

Installing from github

To install the package directly from github, use:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("RevolutionAnalytics/AzureML")

Installing a zip program (on windows)

To publish web services, you need to have an external zip utility installed. This utility should be in the available in the path. See ?zip for more details.

On windows, it's sufficient to install RTools.

Note: the utility should be called zip, since the R function zip() looks for a file called zip in the path. Thus, publishWebservice() may fail, even if you have a program like 7-zip installed.

2. Find your AzureML credentials

To use any of the functions, you need your AzureML credentials. To find these, read the vignette Getting Started with the AzureML Package.

You will need these credentials to use the function workspace(). This function sets up your credentials in R and allows you to use all of the other functions in the package.

3. Create a json file with your credentials

The easiest way to use the workspace() function is to create a json file in the location ~/.azureml/settings.json

Copy the following and modify with your own credentials:

{"workspace":{
"id"                  : "Add your id here",
"authorization_token" : "Add your authorisation token here"
}}

Then save the file at ~/.azureml/settings.json. On windows, save the file at C:\Users\<yourname>\Documents\.azureml

If you have any doubt as to the location of `~/", try:

> path.expand("~/")
[1] "C:/Users/adevries/Documents/"

Work with AzureML datasets

4. Download an AzureML dataset to your workspace

Try:

  • Read the help for ?workspace, ?datasets and ?download.datasets
  • Create a workspace object
  • Getting a listing of available datasets in your workspace
  • Download a specific dataset from AzureML as a data frame
ws <- workspace()
d <- datasets(ws)
dat <- download.datasets(d, "Movie Ratings")
head(dat)

Publish an R script as an AzureML Web Service

5. Publish a Web Service

You can publish almost any R function as a web service in AzureML, subject to some input/output constraints.

  • Read the help for ?publishWebService
  • Try some of the examples in ?publishWebService or ?consume

Here is a more complicated example showing how to create a function that takes ordered factors as input:

# Train a model using diamonds in ggplot2

library(rpart)
data(diamonds, package="ggplot2")
set.seed(1)
train_idx = sample.int(nrow(diamonds), 30000)
test_idx = sample(setdiff(seq(1, nrow(diamonds)), train_idx), 500)
train <- diamonds[train_idx, ]
test  <- diamonds[test_idx, ]

model <- glm(price ~ carat + clarity + color + cut - 1, data = train, 
             family = Gamma(link = "log"))

diamondLevels <- diamonds[1, ]

# The model works reasonably well, except for some outliers

plot(exp(predict(model, test)) ~ test$price)

# Create a function to publish. The function takes care of converting characters correctly to factors

predictDiamonds <- function(x){
  x$cut     <- factor(x$cut,     
                      levels = levels(diamondLevels$cut), ordered = TRUE)
  x$clarity <- factor(x$clarity, 
                      levels = levels(diamondLevels$clarity), ordered = TRUE)
  x$color   <- factor(x$color,   
                      levels = levels(diamondLevels$color), ordered = TRUE)
  predict(model, newdata = x, type="response")
}

# Publish the service

ws <- workspace()
ep <- publishWebService(ws, fun = predictDiamonds, name = "diamonds",
                  inputSchema = test)

6. Consume the model from R

Now that you've published an API, you can send data for scoring by using the function consume().

  • Read the help for ?consume
  • Try some of the examples

To consume the model you published in the previous section, try:

results <- consume(ep, test)$ans

# A summary of the relative prediction errors:
summary((results - test$price) / test$price)

# Compare the AzureML results with locally computed ones:
crossprod(predictDiamonds(test) - results)

Delete this example web service when you're done if you wish:

deleteWebService(ws, "diamonds")

Reporting issues and problems

To report issues or problems, use the issue log or send me a direct message:

⚠️ **GitHub.com Fallback** ⚠️