IBM BigSQL and Cloudera - stanislawbartkowski/wikis GitHub Wiki
IBM BigSQL 7.x is supported only on CDP, HDP is deprecated.
Source of truth: https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/hdp_bigsql_versions.html
Several remarks.
HDFS superuser
During Head Node installation, the IBM BigSQL installer switches to hdfs superuser to create bigsql HDFS home directory /user/bigsq. Unfortunately, it is not working using the default hdfs Cloudera superuser, the only solution is to create alternative superuser account. If Cloudera CDP is already Kerberized, follow the instruction https://github.com/stanislawbartkowski/wikis/wiki/Cloudera-CDP-and-Kerberos#hdfs-superuser
If Cloudera CDP is no Kerberized, the alternative HDFS superuser account should be created at the Linux level.
Assume uhdfs superuser group and user.
Create uhdfs user on all Linux nodes in the cluster. Important: do not forget to create an account on HDFS Name Node, the HDFS users and groups membership are taken from HDFS Name Node, not the node where the hdfs command is executed,
Cloudera Console->Cluster->HDFS->Configuration->Security->Superuser Group Enter: uhdf and restart all services impacted.
Make sure that the change takes effect.
id uhdfs hdfs groups uhdfs
Run a simple test to verify (use su - uhdfs, not sudo -u uhdfs)
su - uhdfs sh -c "hdfs dfs -mkdir /test" su - uhdfs sh -c "hdfs dfs -rmdir /test"
Prerequisites
- HDFS Gateway role is installed on IBM BigSQL Head node
- Passwordless connection exists between Head node and BigSQL Worker nodes and Cloudera Master Nodes.
Install Cloudera Manager Python client (IBM BigSQL Head node only)
https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=sql-installing-cloudera-manager-python-client
curl https://bootstrap.pypa.io/pip/3.4/get-pip.py -o get-pip.py python get-pip.py python /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/python/setup_cloudera_api.py
BigSQL yum repository
Execute it on the future BigSQL Head node.
If the off-line installation is configured, modify IBM BigSQL internal repo definition. During the installation, this repo overwrites /etc/yum.repos.d IBM BigSQL repository.
vi /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/repos/IBM-Big_SQL-7_1_0_0.repo
[IBM-Big_SQL-7_1_0_0]
name=IBM-Big_SQL-7_1_0_0
baseurl=http://{ local repo URL }
enabled=1
gpgcheck=0
Set installation parameters
As a minimum:
/usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-config -set "HDFS_USER=uhdfs" /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-config -set "CM_HOST=<Cloudera Manager hostname>"
Prepare list of BigSQL hostnames. The Head Node hostname should be the first on the list.
This example specifies velarize1 as Head Node and three Worker Nodes.
vi /tmp/bigsqlHostList
velarize1.fyre.ibm.com
pimiento2.fyre.ibm.com
pimiento3.fyre.ibm.com
pimiento4.fyre.ibm.com
Run pre-checker and installer
/usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-precheck -hostList /tmp/bigsqlHostList /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/bigsql-install
Health-check when completed.
cd /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli ./bigsql-admin -smoke ./bigsql-admin -smoke -l ./bigsql-admin -health
HBase
Make sure that HBase component is installed on IBM BigSQL Head Node. If HBase Master or Region Server nodes do not overlap with IBM BigSQL Head Node, install HBase Gateway service.
Smote test if IBM BigSQL integrated with HBase.
./bigsql-admin -smoke -b
After CDP is Kerberized, hbase superuser cannot be accessed using su - hbase command. The only solution is to create alternative HBase superuser to run grant command.
Cloudera Console->Cluster->HBase->Configuration->Search->hbase.superuser
Enter the name, assume uhbase. Only user name, not group name, can be used here. Restart HBase.
In Active Directory/Kerberos create uhbase account.
kinit [email protected] hbase shell grant 'bigsql', 'RWXCA'
IBM BigSQL console
https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=installing-optional-db2-big-sql-console
Do not install BigSQL Console from Cloudera Manager although the option is available.
Command-line installation is simple and straightforward. On BigSQL Head node.
cd /usr/ibmpacks/IBM-Big_SQL/7.1.0.0/bigsql-cli/
Review configuration parameteres, you can stay with default values.
vi config/ucconfig.ini
Install.
./uc-install
Configuration
The memory allocated to BigSQL
https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=server-configuring-db2-big-sql
The default is 25%. It can be not enough for more resource-demanding queries or a limited environment.
bigsql-config -set "BIGSQL_MEM_PERCENT=50"
After increasing it, the memory allocated for YARN workload should be reduced accordingly.
Hive Metastore
Verify the memory allocated to Hive Metastore. If it is too small, the Hive Metastore will fail and BigSQL will not be able to operate. Go to Home > Hive > Configuration > Hive Metastore > Resource Management -> hive_metastore_java_heapsize Increase to at least 0.5 GB if less.