CP4D and IBM DB2 BigSQL - stanislawbartkowski/CP4D GitHub Wiki

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-welcome/bigsql.html

Installation

There are two elements to install:

  • IBM DB2 BigSQL in CP4D
  • Provision an instance of DB2 BigSQL

Limitations

Comparing to standard IBM DB2 BigSQL, the CP4D DB2 BigSQL deployment comes with limitations.

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_known_issues.html

One is particularly important:

  • Transactional tables are not supported.

The transactional tables are enabled as default in Hive3. Only external tables, not managed by Hive, can be synchronized with DB2 IBM BigSQL.

Prerequisites

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_setup_cluster.html

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_install_prereqs.html

Additional remarks:

Make sure that HDFS is accessible from OpenShift cluster nodes. In some environments, HDFS NameNode hostname can resolve as IP in a private network and as different IP ports in a public network used by OpenShift cluster. Assuming that HDFS NameNode as sawtooth1.fyre.ibm.com, run the command from any OpenShift nodes:

nc -zv sawtooth1.fyre.ibm.com 8020

Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 9.30.210.213:8020.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.

If there is no connection, it means that HDFS is listening on a private network. The solution is to make HDFS listening on all networks. https://www.ibm.com/support/pages/ibm-biginsights-how-configure-hadoop-client-port-8020-bind-all-network-interfaces

  • Ambari console -> HDFS -> Configs -> Advanced -> Custom hdfs-site

  • Add: dfs.namenode.rpc-bind-host = 0.0.0.0

  • Restart HDFS and check NameNode again.

If HDFS is protected by Kerberos, verify that Kerberos server is reachable by OpenShift cluster.

nc -zv verse1.fyre.ibm.com 88

Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 9.30.54.109:88.
Ncat: 0 bytes sent, 0 bytes received in 0.04 seconds.

Install IBM DB2 BigSQL

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_install.html

DB2 BigSQL is installed using the same method as all other CP4D services.

./cpd-cli adm --repo ./repo.yaml --assembly big-sql --namespace zen --apply
./cpd-cli install --repo ./repo.yaml --assembly big-sql --namespace zen --storageclass managed-zen-storage
./cpd-cli status --assembly big-sql --namespace zen

Displaying CR status for all assemblies and relevant modules

Status for assembly big-sql and relevant modules in project zen:

Assembly Name                           Status            Version          Arch    
big-sql                                 Ready             7.1.1            x86_64  

  SubAssembly Name                      Status            Version          Arch    
  lite                                  Ready             3.5.1            x86_64  

  Module Name                           Version           Arch             Storage Class     
  big-sql                               7.1.1             x86_64           managed-zen-storage

=========================================================================================

Remote Hadoop

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_install_prereqs.html

Conduct necessary arrangements in remote Hadoop cluster, particularly make bigsql a valid user in Hadoop.

hdfs dfs -mkdir /user/bigsql
hdfs dfs -chmod 755 /user/bigsql
hdfs dfs -chown -R bigsql /user/bigsql
hdfs dfs -mkdir /user/bigsql/sync
hdfs dfs -chmod 777 /user/bigsql/sync
hdfs dfs -chown -R bigsql:hdfs /user/bigsql
hdfs dfs -chown -R bigsql:hdfs /user/bigsql/sync

Also, give bigsql user access to HDFS Hive warehouse directory using hdfs command line or Ranger.

If Ranger authorization is enabled, give bigsql user access to Hive databases and tables.

Provision DB2 BigSQL instance

Collect all necessary information.

Parameter Example
Ambari URL http://sawtooth1.fyre.ibm.com:8080
Ambadri admin admin
Ambari password secret
If Hadoop kerberized then Kerberos admin principal.
The same used for Hadoop kerberization
[email protected]
Kerberos admin password secret

In a tiny environment, the instance provisioning can be blocked because of lack of resources. The ugly and dirty solution on the fly is to reduce manually the cpu and memory request in Deployment yaml definition. Not recommended in a production environment.

To deploy an instance, go to CP4D console -> Left menu -> Services -> Instances -> New Instance -> DB2 BigSQL tile -> New instance (again) -> Follow user-friendly wizard pages

Review and apply if necessary post-provisioning reconfiguration.
https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_post_prov_tsks.html

Basic health-check

Check pods up and running, in this example, only one worker node is used.

oc project zen
oc get pods | grep bigsql

bigsql-1611104274650644-head-7cb969fb7d-vnzsw                1/1     Running     0          9h
bigsql-1611104274650644-worker-0                             1/1     Running     0          9h
bigsql-addon-5f7f6b876f-4rmtm                                1/1     Running     0          9h
bigsql-service-provider-64695f97c9-wsc5w                     1/1     Running     0          9h

Run a command in DB2 BigSQL Head Node pod.

oc -it exec bigsql-1611104274650644-head-7cb969fb7d-vnzsw -- bash
source $HOME/.profile
bigsql status

SERVICE              HOSTNAME                               NODE      PID STATUS

Big SQL Scheduler    head.bigsql-1611104274650644.zen.svc.cluster.local    -    35966 Available
Big SQL Worker       bigsql-1611104274650644-worker-0.bigsql-1611104274650644.zen.svc.cluster.local    1    33759 DB2 Running
Big SQL Master       head.bigsql-1611104274650644.zen.svc.cluster.local    0    42778 DB2 Running

bigsql-admin -health

...............
All nodes are registered OK.
------------------------------------------
Processing completed successfully
Total Processing Time (secs) : 26

Verify HDFS access.

hdfs dfs -ls /tmp

Found 10 items
drwx------   - ambari-qa hdfs          0 2021-01-10 12:56 /tmp/ambari-qa
drwxr-xr-x   - hdfs      hdfs          0 2021-01-08 14:30 /tmp/entity-file-history
drwx-wx-wx   - hive      hdfs          0 2021-01-20 01:15 /tmp/hive
-rw-r--r--   3 hdfs      hdfs       1024 2021-01-08 14:31 /tmp/id0b0a3020_date310821
...........

Verify Hive Server2 connection.

Copy and paste Hive URL connection string from Ambari console -> Hive -> Summary -> HIVESERVER2 JDBC URL

beeline -u "jdbc:hive2://banquets1.fyre.ibm.com:2181,sawtooth1.fyre.ibm.com:2181,sawtooth2.fyre.ibm.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n bigsql

Run several commands to be sure that bigsql is authorized in Hive2 server.

show databases;
create database sampledb;
use sampledb;
create table test (x int);
insert into test values(1);
select * from test;
drop table test;
drop database sampledb;

Verify DB2 BigSQL Hive connection.

db2 connect to bigsql
db2 "create hadoop table test (x int)"
db2 "insert into table test values(1)" db2 "select * from test"

Verify that test table just created is accessible in Hive.

beeline -u .... -n bigsql use bigsql; show tables; select * from test;

INFO  : OK
+---------+
| test.x  |
+---------+
| 1       |
+---------+
1 row selected (0.374 seconds)

Hive and DB2 BigSQL synchronization

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_sync.html

If DB2 BigSQL is deployed directly to Hadoop cluster, Hive to DB2 BigSQL metastore synchronization is done automatically, every change in Hive metastore is immediately echoed in DB2 BigSQL. In CP4D DB2 BigSQL, automatic synchronization requires additional arrangements.

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.5.0/svc-bigsql/bigsql_enable_AutoHcatSync.html
Another option is synchronization on-demand by calling an appropriate stored procedure.
The stored procedure can be called from DB2 BigSQL Head pod db2 command line or through IBM® Db2 Data Management Console service.

A basic test.

CP4D DB2 BigSQL can only be synchronized with non-transactional Hive tables. The table should be created as *external", not managed by Hive.

Create database and a table in Hive.

beeline ......
create database syncdb;
use syncdb;
create external table synctest (x int);
insert into synctest values(1);
select * from synctest;

+-------------+
| synctest.x  |
+-------------+
| 1           |
+-------------+

Log on the DB2 BigSQL Head pod.

db2 connect to bigsql

Run HCAT_SYNC_OBJECTS stored procedure. This call brings to DB2 BigSQL all non-transactional tables from Hive. You can tailor the command to meet more specific needs.

https://www.ibm.com/support/knowledgecenter/en/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.commsql.doc/doc/biga_hadsyncobj.html

db2 "CALL SYSHADOOP.HCAT_SYNC_OBJECTS('.','.','a','REPLACE','CONTINUE')"

 Result set 1
  --------------

  OBJSCHEMA OBJNAME           OBJATTRIB TYPE STATUS DETAILS                                                     
  --------- ----------------- --------- ---- ------ ------------------------------------------------------------
  testdb    x                 -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    aaa               -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    account_holders   -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    bank_accounts     -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    bank_clients      -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    bank_customers    -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    bank_wrkex_salary -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    checking_accounts -         ?    WARN   Transactional tables are disabled. This table is not synced.
  testdb    savings_accounts  -         ?    WARN   Transactional tables are disabled. This table is not synced.
  SYNCDB    SYNCTEST          -         T    OK     -                                                           

Only non-transactional SYNCTEST table is synchronized.

db2 "select * from syncdb.synctest"

X          
-----------
          1

  1 record(s) selected.

Configure external connection to DB2 BigSQL

DB2 BigSQL Services

Discover DB2 BigSQL services.

oc get svc | grep bigsql

bigsql-1611104274650644                    ClusterIP   None             <none>        2222/TCP                                15h
bigsql-1611104274650644-jdbc               NodePort    172.30.112.110   <none>        32051:31293/TCP,32052:31638/TCP         15h
bigsql-addon                               ClusterIP   172.30.223.142   <none>        80/TCP,443/TCP                          15h
bigsql-service-provider                    ClusterIP   172.30.188.91    <none>        3443/TCP                                15h

The DB2 BigSQL non-secure 32051 port is redirected to 31293 OpenShift NodePort.

Give access

Modify the HAProxy configuration on the gateway node. Replaces IP addresses with corresponding addresses of OpenShift master nodes.

vi /etc/haproxy/haproxy.cfg

frontend bigsql-tcp
        bind *:31293
        default_backend bigsql-tcp
        mode tcp
        option tcplog

backend bigsql-tcp
        balance source
        mode tcp
        server master0 10.16.40.242:31293 check
        server master1 10.16.40.243:31293 check
        server master2 10.16.40.249:31293 check

Restart HAProxy service.

systemctl restart haproxy

Verify that port is responding assuming shrieker-inf is OpenShift gateway host.

nc -zv shrieker-inf 31293

Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 9.30.182.157:31293.
Ncat: 0 bytes sent, 0 bytes received in 0.24 seconds.

Configure DB2 client

Catalog BigSQL Head node.

db2 catalog tcpip node BIGSQL remote shrieker-inf server 31293

For remote connection, bigsql password is needed. The password is generated randomly and can be copied from OpenShift console.

OpenShift console -> CP4D project -> Secrets -> search for bigsql-secret -> copy from bigsqlPassword

Important: it is against the good practice to use bigsql credentials for querying data, use it only for the administrative task.

Verify DB2 BigSQL Head node connectivity.

db2 attach to bigsql user bigsql

Enter current password for bigsql: 

   Instance Attachment Information

 Instance server        = DB2/LINUXX8664 11.5.4.0
 Authorization ID       = BIGSQL
 Local instance alias   = BIGSQL

Catalog bigsql database.

db2 catalog database bigsql at node bigsql

Connect and run the query.

db2 connect to bisql user bigsql
db2 "select * from syncdb.synctest"


X          
-----------
          1

  1 record(s) selected.

⚠️ **GitHub.com Fallback** ⚠️