IBM BigSQL and Cloudera - stanislawbartkowski/wikis GitHub Wiki

IBM BigSQL 7.x is supported only on CDP, HDP is deprecated.

Source of truth: https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/hdp_bigsql_versions.html

Several remarks.

HDFS superuser

During Head Node installation, the IBM BigSQL installer switches to hdfs superuser to create bigsql HDFS home directory /user/bigsq. Unfortunately, it is not working using the default hdfs Cloudera superuser, the only solution is to create alternative superuser account. If Cloudera CDP is already Kerberized, follow the instruction https://github.com/stanislawbartkowski/wikis/wiki/Cloudera-CDP-and-Kerberos#hdfs-superuser

If Cloudera CDP is no Kerberized, the alternative HDFS superuser account should be created at the Linux level.

Assume uhdfs superuser group and user.

Create uhdfs user on all Linux nodes in the cluster. Important: do not forget to create an account on HDFS Name Node, the HDFS users and groups membership are taken from HDFS Name Node, not the node where the hdfs command is executed,

Cloudera Console->Cluster->HDFS->Configuration->Security->Superuser Group Enter: uhdf and restart all services impacted.

Make sure that the change takes effect.

id uhdfs hdfs groups uhdfs

Run a simple test to verify (use su - uhdfs, not sudo -u uhdfs)

su - uhdfs sh -c "hdfs dfs -mkdir /test" su - uhdfs sh -c "hdfs dfs -rmdir /test"

Prerequisites

HDFS Gateway role is installed on IBM BigSQL Head node
Passwordless connection exists between Head node and BigSQL Worker nodes and Cloudera Master Nodes.

Install Cloudera Manager Python client (IBM BigSQL Head node only)

https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=sql-installing-cloudera-manager-python-client

curl https://bootstrap.pypa.io/pip/3.4/get-pip.py -o get-pip.py python get-pip.py python /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/python/setup_cloudera_api.py

BigSQL yum repository

Execute it on the future BigSQL Head node.

https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/hdp_valaddinst.html

If the off-line installation is configured, modify IBM BigSQL internal repo definition. During the installation, this repo overwrites /etc/yum.repos.d IBM BigSQL repository.

vi /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/repos/IBM-Big_SQL-7_1_0_0.repo

[IBM-Big_SQL-7_1_0_0]
name=IBM-Big_SQL-7_1_0_0
baseurl=http://{ local repo URL }
enabled=1
gpgcheck=0

Set installation parameters

As a minimum:

/usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-config -set "HDFS_USER=uhdfs" /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-config -set "CM_HOST=<Cloudera Manager hostname>"

Prepare list of BigSQL hostnames. The Head Node hostname should be the first on the list.

This example specifies velarize1 as Head Node and three Worker Nodes.

vi /tmp/bigsqlHostList

velarize1.fyre.ibm.com
pimiento2.fyre.ibm.com
pimiento3.fyre.ibm.com
pimiento4.fyre.ibm.com

Run pre-checker and installer

/usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-precheck -hostList /tmp/bigsqlHostList /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-install

Health-check when completed.

cd /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli ./bigsql-admin -smoke ./bigsql-admin -smoke -l ./bigsql-admin -health

HBase

Make sure that HBase component is installed on IBM BigSQL Head Node. If HBase Master or Region Server nodes do not overlap with IBM BigSQL Head Node, install HBase Gateway service.

Smote test if IBM BigSQL integrated with HBase.

./bigsql-admin -smoke -b

https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/trb_hbase_perm_err.html

After CDP is Kerberized, hbase superuser cannot be accessed using su - hbase command. The only solution is to create alternative HBase superuser to run grant command.

Cloudera Console->Cluster->HBase->Configuration->Search->hbase.superuser

Enter the name, assume uhbase. Only user name, not group name, can be used here. Restart HBase.

In Active Directory/Kerberos create uhbase account.

kinit [email protected] hbase shell grant 'bigsql', 'RWXCA'

IBM BigSQL console

https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=installing-optional-db2-big-sql-console

Do not install BigSQL Console from Cloudera Manager although the option is available.

Command-line installation is simple and straightforward. On BigSQL Head node.

cd /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/

Review configuration parameteres, you can stay with default values.

vi config/ucconfig.ini

Install.

./uc-install

Configuration

The memory allocated to BigSQL

https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=server-configuring-db2-big-sql

The default is 25%. It can be not enough for more resource-demanding queries or a limited environment.

bigsql-config -set "BIGSQL_MEM_PERCENT=50"

After increasing it, the memory allocated for YARN workload should be reduced accordingly.

Hive Metastore

Verify the memory allocated to Hive Metastore. If it is too small, the Hive Metastore will fail and BigSQL will not be able to operate. Go to Home > Hive > Configuration > Hive Metastore > Resource Management -> hive_metastore_java_heapsize Increase to at least 0.5 GB if less.