Installing Hadoop using Cloudera distribution - dryshliak/hadoop GitHub Wiki

This tutorial feature how to install a Hadoop Multi-Node Cluster using automated installation of Cloudera Manager and CDH through an installation wizard.

Prerequisites

  • Three node
  • Disk space (min 40GB per node)
  • Ubuntu 16.04
  • Cloudera release (current 5.14.2)
  • Sudo or root access on all computer in cluster
  • Ssh server and client
  • Configuring SSH access.
  • Enable NTP
  • FQDN names
  • Disable IPTables
  1. Before starting of installing any applications or software, please makes sure your list of packages from all repositories and PPA’s is up to date or if not update them by using this command:
sudo apt-get update && sudo apt-get dist-upgrade -y
  1. Disable firewall
sudo service ufw stop && sudo ufw disable
  1. Configure NTP clients
sudo apt-get install ntp -y
sudo /etc/init.d/ntp stop && sudo ntpdate pool.ntp.org && sudo /etc/init.d/ntp start
  1. Disable Transparent Huge Pages
sudo su
echo never > /sys/kernel/mm/transparent_hugepage/enabled
  1. Setup Swappiness
sudo su
echo 'vm.swappiness = 10' >> /etc/sysctl.conf
  1. Select binary file for installation Cloudera distribution from next page

  2. Download latest release

cd /opt
wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
chmod +x cloudera-manager-installer.bin
./cloudera-manager-installer.bin
  1. You will see next

  2. After this, you need to accept two times license agreements and wait until installation Cloudera managment finished and asked you to visit a web page on port 7180.
    Note: To know how passes process of installation Cloudera manager please run: sudo tail -f /var/log/cloudera-manager-installer/*

  3. Go to hostname:7180, default login and password admin/admin

  4. Choice appropriate license for your purpose (recommend Cloudera Enterprise)

  5. Enter all FQDN node name which you want to add to the cluster (instance with Cloudera manager also include)

  6. By default, Cloudera manager takes the latest parcel Note: If want to install earlier version click "More Options" and add manually remote parser and it will be available for choice

  7. Select to install "Install Oracle Java SE Development Kit (JDK)"

  8. Skip sing mode installation

  9. Choice appropriate type of connection to nodes

  10. Process of installation Cloudera agents on nodes

  11. Downloading and installing parcel on nodes

  12. Inspect hosts

  13. Choice Custome service and select next items:

  • HDFS
  • YARN
  • ZooKeeper
  • Spark
  • Hive
  • Oozie
  • Impala
  1. Distribute service between hosts and divide the roles evenly

  2. Database setup, Cloudera manager takes care of you, you just need to click Test connection->Continue

  3. Finish setup cluster