CDP 3.1.5 installation - stanislawbartkowski/wikis GitHub Wiki

Steps to install CDP 3.1.5

Prerequisities

CDP repositories are protected. Before starting the installation, prepare your authorized credentials: user/password

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-before-you-install.html

  • Passwordless connection between Cloudera Manager host and other hosts
  • Disable firewall
  • Disable SELinux
  • NTP service running
  • Reduce swappiness (otherwise the cluster will be reported as unhealthy)

Install Java

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-installing-open-jdk-using-cm.html

yum install java-1.8.0-openjdk-devel
yum install java-1.11.0-openjdk-devel

Ansible command

ansible all -m yum -a "name=java-1.8.0-openjdk-devel"

Repository

Use public repository or private repository.

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-setting-up-web-server.html

The repository directory structure should look like:

./cm7
./cm7/7.2.4
./cm7/7.2.4/redhat7
./cm7/7.2.4/redhat7/yum
./cm7/7.2.4/redhat7/yum/repodata
./cm7/7.2.4/redhat7/yum/repodata/72bf7c59d10240ee6faf8046ed7a51551063ad152e51191e8b438b5d3e5b17de-other.sqlite.bz2
./cm7/7.2.4/redhat7/yum/repodata/repomd.xml.asc
./cm7/7.2.4/redhat7/yum/repodata/46bf3ce3bc78a99301e4c3ff618df1f98a56685970aead80fed58822f03f80d1-filelists.sqlite.bz2
./cm7/7.2.4/redhat7/yum/repodata/7965d336ea8a9b7e7fced02e016739e6e5455ee6ea44add30e78674a63513eef-other.xml.gz
./cm7/7.2.4/redhat7/yum/repodata/837d4cdd49971be5cc95d2fc56080cd3ea5d98f0df2257e2efa6309772eeeeea-filelists.xml.gz
./cm7/7.2.4/redhat7/yum/repodata/99f5ce18c47a77fcc0b6f1ac424c56c92fe3ecea930b5a1bb410de4a35fa33c0-primary.xml.gz
./cm7/7.2.4/redhat7/yum/repodata/repomd.xml.key
./cm7/7.2.4/redhat7/yum/repodata/cdb631a1f9badc65d5e67c1a6b05a666ebd9ed787cdf904ad91bb0f05c16f83b-primary.sqlite.bz2
./cm7/7.2.4/redhat7/yum/repodata/repomd.xml
./cm7/7.2.4/redhat7/yum/cloudera-manager.repo
./cm7/7.2.4/redhat7/yum/RPM-GPG-KEY-cloudera
./cm7/7.2.4/redhat7/yum/cloudera-manager-installer.bin
./cm7/7.2.4/redhat7/yum/RPMS
./cm7/7.2.4/redhat7/yum/RPMS/x86_64
./cm7/7.2.4/redhat7/yum/RPMS/x86_64/cloudera-manager-agent-7.2.4-7594142.el7.x86_64.rpm
./cm7/7.2.4/redhat7/yum/RPMS/x86_64/cloudera-manager-server-db-2-7.2.4-7594142.el7.x86_64.rpm
./cm7/7.2.4/redhat7/yum/RPMS/x86_64/cloudera-manager-daemons-7.2.4-7594142.el7.x86_64.rpm
./cm7/7.2.4/redhat7/yum/RPMS/x86_64/cloudera-manager-server-7.2.4-7594142.el7.x86_64.rpm
./cm7/7.2.4/redhat7/yum/RPMS/x86_64/openjdk8-8.0+232_9-cloudera.x86_64.rpm
./cm7/7.2.4/redhat7/yum/RPMS/x86_64/enterprise-debuginfo-7.2.4-7594142.el7.x86_64.rpm
./cm7/7.2.4/redhat7/yum/RPMS/noarch
./cm7/7.2.4/redhat7/yum/allkeys.asc
./cm7/7.2.4/redhat7/yum/SRPMS
./cdh7
./cdh7/7.1.5.0
./cdh7/7.1.5.0/parcels
./cdh7/7.1.5.0/parcels/CDH-7.1.5-1.cdh7.1.5.p0.7431829-el7.parcel
./cdh7/7.1.5.0/parcels/manifest.json
./cdh7/7.1.5.0/parcels/CDH-7.1.5-1.cdh7.1.5.p0.7431829-el7.parcel.sha1
./cdh7/7.1.5.0/parcels/CDH-7.1.5-1.cdh7.1.5.p0.7431829-el7.parcel.sha256

Yum repository for Cloudera Manager

https://[username]:[password]@archive.cloudera.com/p/cm7/7.3.1/redhat7/yum/cloudera-manager.repo

The public repository is behind the paywall, add credentials to downloaded cloudera-manager.repo

[cloudera-manager]
name=Cloudera Manager 7.3.1
baseurl=https://archive.cloudera.com/p/cm7/7.3.1/redhat7/yum/
gpgkey=https://archive.cloudera.com/p/cm7/7.3.1/redhat7/yum/RPM-GPG-KEY-cloudera
username=[username]
password=[password]
gpgcheck=1
enabled=1
autorefresh=0
type=rpm-md

yum repolist

cloudera-manager Cloudera Manager 7.3.1   6

Install Cloudera Manager

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-install-cm-packages2.html

yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server

Prepare CPD database

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-configuring-starting-postgresql-server.html

For instance, PostgreSQL a docker container.

docker run --name postgres -e POSTGRES_PASSWORD=secret --restart=always -p 5432:5432 -d postgres

As a test, connect to PostgreSQL database from Clouder Manager host.

psql -h <postgres host> -U postgres

Password for user postgres: 
psql (13.3, server 13.1 (Debian 13.1-1.pgdg100+1))
Type "help" for help.

postgres=# 

Create database for Cloudera Manager.

CREATE ROLE scm LOGIN PASSWORD 'secret';
ALTER ROLE scm WITH LOGIN;
CREATE DATABASE scm OWNER scm ENCODING 'UTF8';

Verify that you can connect to database scm as user scm with admin authority.

psql -h <postgres host> -U scm scm

Password for user scm: 
psql (13.3, server 13.1 (Debian 13.1-1.pgdg100+1))
Type "help" for help.

scm=> 

create table x (x int);

drop table x;

Other databases

CREATE ROLE rman LOGIN PASSWORD 'secret';
CREATE DATABASE rman OWNER rman ENCODING 'UTF8';

CREATE ROLE hive LOGIN PASSWORD 'secret';
ALTER ROLE hive WITH LOGIN;
CREATE DATABASE metastore OWNER hive ENCODING 'UTF8';

Create and verify Cloudera Manager database access

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-syntax-scm-prepare-database.html

/opt/cloudera/cm/schema/scm_prepare_database.sh postgresql scm scm secret -h <postgres host>

Enter database password: 
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing:  /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
[                          main] DbCommandExecutor              INFO  Successfully connected to database.
All done, your SCM database is configured correctly!

Start Cloudera Manager

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-install-runtime-other-software.html

systemctl start cloudera-scm-server
systemctl enable cloudera-scm-server


tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log

..................
2021-06-05 02:06:53,544 INFO WebServerImpl:org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@1cff36ae{HTTP/1.1,[http/1.1]}{0.0.0.0:7180}
2021-06-05 02:06:53,545 INFO WebServerImpl:org.eclipse.jetty.server.Server: Started @102983ms
2021-06-05 02:06:53,545 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.

Launch Cloudera Manager

http://<Cloudera Manager host>:7180

Default credentials: admin/admin

Important: the first login will take some time because Cloudera Manager database schema is created.

Install CDP cluster

https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/installation/topics/cdpdc-setup-cluster-using-wizard.html

Follow the GUI Wizard. The confusing part is setting up the parcel repository. Remove all of them leaving the one only and include your Cloudera credentials.

Install Cloudera Manager Agents manually

https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/installation/topics/cdpdc-manually-install-cm-agent-packages.html

Cloudera Manager Agents can be installed manually particularly if Installation Wizard is blocked on "Install Agents" page. The Agent should be installed on every host in the cluster.

Copy /etc/yum.repos.d/cloudera-manager.repo to all nodes.

ansible all -m copy -a "src=/etc/yum.repos.d/cloudera-manager.repo dest=/etc/yum.repos.d/cloudera-manager.repo"

Install packages required.

yum install cloudera-manager-agent cloudera-manager-daemons
ansible all -m yum -a "name=cloudera-manager-agent,cloudera-manager-daemons"

Modify the agent configuration file, enter the hostname of Cloudera Manager.

vi /etc/cloudera-scm-agent/config.ini

[General]
# Hostname of the CM server.
server_host=\<hostname\>

# Port that the CM server is listening on.
server_port=7182

## It should not normally be necessary to modify these.
# Port that the CM agent should listen on.
# listening_port=9000

Distribute configuration file across the cluster.

ansible all -m copy -a "src=/etc/cloudera-scm-agent/config.ini dest=/etc/cloudera-scm-agent/config.ini"

Restart the Cloudera Manager Agent on all nodes.

ansible all -a "systemctl start cloudera-scm-agent"
ansible all -a "systemctl enable cloudera-scm-agent"

Several tips

  • Cannot modify RM UI network listening to. If RM hostname resolves to a private network, cannot access outside the private network.
  • To enter Knox Admin UI use local Linux user credentials where Knox Gateway is deployed.
  • To install Hive. Install: TEZ, Hive Metastore (not Hive2), Hive on Tez
  • Set up Kerberos and Ranger at the very beginning of the installation.
  • Increase memory for HBase Master - default is 50M which cause OutOfMemory failure.
  • Increase memory for YARN, Java Heap Size of ResourceManager in Bytes and Java Heap Size of NodeManager in Bytes - default is 50MB. (Look for mx in Yarn Configuration panel)
  • After increasing the number of Zookeeper servers, the Kafka broker will not start. The solution is to remove meta.properties server.

mv /var/local/kafka/data/meta.properties /var/local/kafka/data/meta.properties.old

  • To enter Atlas Admin UI use local Linux user credentials where Atlas Admin UI is deployed.
  • Yarn Queue Manager UI. Yarn Configuration -> Queue Manager Service. Restart service. The UI is available "Clusters".
⚠️ **GitHub.com Fallback** ⚠️