Home - N4L/PredictionIO GitHub Wiki

Detailed documentation can be found http://predictionio.incubator.apache.org/index.html

Community Git: https://github.com/actionml

Community Group: https://groups.google.com/forum/#!forum/actionml-user

PredictionIO is a platform to provide machine learning as service. It contains several components such as Event Store, Training Platform, Model Storage and Prediction Platform.

PredictionIO Architect

In Pond environment, we use PostgreSql as events storage, because HBase doesn't support the request get the latest record for a kind of events. This request is very useful for ETL jobs.

A PreictionIO server is typically running on Linux server. For office local development environment, we need to manually deploy the server onto the server 192.168.50.118. Please follow the steps below.

Install HBase, ElasticSerach and Spark

  1. SSH into 192.168.50.118
  2. Execute bash -c "$(curl -s https://raw.githubusercontent.com/actionml/PredictionIO/develop/bin/install.sh)"
  3. This command creates a folder /var/PredictionIO and downloads all the required components into /var/PredictionIO/vendors folder

Install PostgreSQL

  1. Please follow the instruction https://wiki.postgresql.org/wiki/YUM_Installation
  2. Download JDBC41 Postgresql Driver from https://jdbc.postgresql.org/download.html
  3. Save the driver under folder /var/PredictionIO/lib

Build PredictionIO Assembly Jar from Pond Repository

  1. Connect to 192.168.50.118 winscp
  2. Copy all the files under repository PredictionIO to folder /var/PredictionIO.src/ on the remote server.
  3. SSH into 192.168.50.118
  4. Navigate to /var/PredictionIO.src/
  5. Execute bash make-distribution.sh
  6. Copy everything under /var/PredictionIO.src/dist into /var/PredictionIO

Config PredictionIO server

  1. Execute vi /var/PredictionIO/conf/pio-env.sh
  2. Set SPARK_HOME to /var/PredictionIO/vendors/spark-1.6.0
  3. Set POSTGRES_JDBC_DRIVER to $PIO_HOME/lib/{Jar File}
  4. Set HBASE_CONF_DIR to /var/PredictionIO/vendors/hbase-1.1.2/conf
  5. Set PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE to PGSQL
  6. Set PIO_STORAGE_SOURCES_PGSQL_TYPE to jdbc
  7. Set PIO_STORAGE_SOURCES_PGSQL_URL to jdbc:postgresql://localhost/pio
  8. Set PIO_STORAGE_SOURCES_PGSQL_USERNAME to pio
  9. Set PIO_STORAGE_SOURCES_PGSQL_PASSWORD to pio
  10. Set PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE to elasticsearch
  11. Set PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS to localhost
  12. Set PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME to /var/PredictionIO/vendors/elasticsearch-1.7.3

Setup PredictionIO Server (Only run once)

  1. Execute pio app new pond
  2. Copy the Access Key and save it to a safe place. This access key will be used for accessing the data of the application.

PredictionIO Useful Commands

  • pio-start-all start all the services.
  • pio-stop-all stop all the services.
  • pio status check the status of the underlying services.

AWS Environment

  • The TeamCity build for the PredictionIO server is under Recommendation.
  • The cookbook is located at Deployment/chef-repo/cookbooks/app_n4l_portal_recommendation.