Home - N4L/PredictionIO GitHub Wiki
Detailed documentation can be found http://predictionio.incubator.apache.org/index.html
Community Git: https://github.com/actionml
Community Group: https://groups.google.com/forum/#!forum/actionml-user
PredictionIO is a platform to provide machine learning as service. It contains several components such as Event Store
, Training Platform
, Model Storage
and Prediction Platform
.
In Pond environment, we use PostgreSql as events storage, because HBase doesn't support the request get the latest record for a kind of events
. This request is very useful for ETL jobs.
A PreictionIO server is typically running on Linux server. For office local development environment, we need to manually deploy the server onto the server 192.168.50.118. Please follow the steps below.
Install HBase, ElasticSerach and Spark
- SSH into 192.168.50.118
- Execute
bash -c "$(curl -s https://raw.githubusercontent.com/actionml/PredictionIO/develop/bin/install.sh)"
- This command creates a folder
/var/PredictionIO
and downloads all the required components into/var/PredictionIO/vendors
folder
Install PostgreSQL
- Please follow the instruction https://wiki.postgresql.org/wiki/YUM_Installation
- Download
JDBC41 Postgresql Driver
from https://jdbc.postgresql.org/download.html - Save the driver under folder
/var/PredictionIO/lib
Build PredictionIO Assembly Jar from Pond Repository
- Connect to 192.168.50.118 winscp
- Copy all the files under repository
PredictionIO
to folder/var/PredictionIO.src/
on the remote server. - SSH into 192.168.50.118
- Navigate to
/var/PredictionIO.src/
- Execute
bash make-distribution.sh
- Copy everything under
/var/PredictionIO.src/dist
into/var/PredictionIO
Config PredictionIO server
- Execute
vi /var/PredictionIO/conf/pio-env.sh
- Set
SPARK_HOME
to/var/PredictionIO/vendors/spark-1.6.0
- Set
POSTGRES_JDBC_DRIVER
to$PIO_HOME/lib/{Jar File}
- Set
HBASE_CONF_DIR
to/var/PredictionIO/vendors/hbase-1.1.2/conf
- Set
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE
toPGSQL
- Set
PIO_STORAGE_SOURCES_PGSQL_TYPE
tojdbc
- Set
PIO_STORAGE_SOURCES_PGSQL_URL
tojdbc:postgresql://localhost/pio
- Set
PIO_STORAGE_SOURCES_PGSQL_USERNAME
topio
- Set
PIO_STORAGE_SOURCES_PGSQL_PASSWORD
topio
- Set
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE
toelasticsearch
- Set
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS
tolocalhost
- Set
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME
to/var/PredictionIO/vendors/elasticsearch-1.7.3
Setup PredictionIO Server (Only run once)
- Execute
pio app new pond
- Copy the
Access Key
and save it to a safe place. This access key will be used for accessing the data of the application.
PredictionIO Useful Commands
pio-start-all
start all the services.pio-stop-all
stop all the services.pio status
check the status of the underlying services.
AWS Environment
- The TeamCity build for the PredictionIO server is under
Recommendation
. - The cookbook is located at
Deployment/chef-repo/cookbooks/app_n4l_portal_recommendation
.