Guzzle archive deployment runbook - ja-guzzle/guzzle_docs GitHub Wiki

  • For HDP 3.1, set following properties in Custom spark2-defaults using Ambari UI depending on environment:
    spark.datasource.hive.warehouse.load.staging.dir=/tmp
    spark.datasource.hive.warehouse.metastoreUri=thrift://localhost:9083
    spark.hadoop.hive.llap.daemon.service.hosts=@llap0
    spark.security.credentials.hiveserver2.enabled=false
    spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;user=maria_dev

    Use the HiveServer2 Interactive JDBC URL, rather than the traditional HiveServer2's JDBC URL

  • Extract guzzle.tar.gz at preferred location (in this example it is extracted to / directory).

  • /guzzle directory must be mounted on each node of the cluster at same location

  • Set GUZZLE_HOME=/guzzle environment variable

  • All users should have read access for files in /guzzle directory and should have write access for /guzzle/logs directory

  • Copy hive connector jar from $GUZZLE_HOME/hive-connectors directory to $GUZZLE_HOME/bin directory depending on environment:

HDP Version Hive connector jar
2.6 spark-hive-connector.jar
3.1 hortonworks-hive-connector.jar
  • Create directory /guzzle/conf and create configuration in it as shown in example config directory /guzzle/sample_conf

  • Update database, context_columns, stages, spark, java etc configuration in /guzzle/conf/guzzle.yml

  • For HDP 2.6, following spark configuration must be set in /guzzle/conf/guzzle.yml:

spark:
  ...
  properties:
    ...
    additional_arguments: "--files /usr/hdp/current/spark2-client/conf/hive-site.xml ..."
  • For HDP 3.1, following hive configuration must be set in /guzzle/conf/guzzle.yml:
guzzle:
  hive:
    table_properties:
      escape.delim: "\\\\"
  • Create /guzzle/conf/passphrase file containing secret token used for encryption/decryption of guzzle config values

  • Update configuration in /guzzle/api/application.yml

  • Update API_URL and API_DOMAIN in /guzzle/web/index.html as per api server configuration

  • Run following commands to download nodejs and elasticsearch binaries on server in ~/tools directory and extract them:

wget "https://nodejs.org/download/release/v6.14.4/node-v6.14.4-linux-x64.tar.gz"
wget "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz"

Set following path in ~/.bash_profile:

export PATH=~/tools/node-v6.14.4-linux-x64/bin:$PATH
  • Relogin in HDP server to apply changes in .bash_profile

  • Install serve node application using following command:

npm install -g [email protected]
  • Start elasticsearch server (in ~/tools/elasticsearch-6.2.4 directory):
nohup ./bin/elasticsearch &
  • Create tables required for guzzle using following command:
java -cp /guzzle/bin/*:/guzzle/libs/* com.justanalytics.guzzle.common.DatabaseInitializer

This database initializer script automatically creates table in guzzle database.

  • Start guzzle api application (in /guzzle/api directory):
nohup java -jar api-0.0.1-SNAPSHOT.jar &
  • Start guzzle web application (in /guzzle/web directory):
nohup serve -p 8082 -s . &
⚠️ **GitHub.com Fallback** ⚠️