Install Kafka - dryshliak/hadoop GitHub Wiki

Prerequisites

Three node (min 1 Gb per node)
Disk space (min 30GB per node)
Ubuntu 16.04
Kafka 2.12-2.6.2
Java 8
SSH access

Install VirtualBox https://www.virtualbox.org/wiki/Downloads
Prepare three instances of appropriate version you can find by below URL
http://releases.ubuntu.com/16.04/ubuntu-16.04.6-server-amd64.iso
During instance preparing add second adapter “Host-only Adapter” in the Network setting. You can face with recognition second adapter, to resolve this please read this article

Choose "OpenSSH server” to have ssh access to instance

On all instances you need to setup hosts files with FQDN names to resolve local DNS names on each node (as explained here) and also remove this line 127.0.1.1<---->"node name"
Disable firewall

sudo service ufw stop && sudo ufw disable

Before starting of installing any applications or software, please makes sure your list of packages from all repositories and PPA’s is up to date or if not update them by using this command:

sudo apt-get update && sudo apt-get dist-upgrade -y

Install Oracle Java

cd /opt
wget --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
mkdir /usr/lib/jvm
tar -xf /opt/jdk-8u131-linux-x64.tar.gz -C /usr/lib/jvm
ln -s /usr/lib/jvm/jdk1.8.0_131 /usr/lib/jvm/default-java
update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.8.0_131/bin/java 100
update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.8.0_131/bin/javac 100
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131/

Copy Java path (in our case JAVA_HOME="/usr/lib/jvm/jdk1.8.0_131") into:

sudo vi /etc/environment

Check Java configuration

update-alternatives --display java
update-alternatives --display javac
java –version

Installing Zookeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Kafka uses Zookeeper for maintaining heart beats of its nodes, maintain configuration, and most importantly to elect leaders.

Download and unpack Zookeeper package

wget https://downloads.apache.org/zookeeper/stable/apache-zookeeper-3.6.3-bin.tar.gz -P /opt
tar -xf /opt/apache-zookeeper-3.6.3-bin.tar.gz -C /opt
ln -s /opt/apache-zookeeper-3.6.3-bin /opt/zookeeper

Create the new Zookeeper user and group using the command

adduser --disabled-password zookeeper --disabled-password

Create zookeeper directory under /var/lib for storing the state associated with the ZooKeeper server and another zookeeper directory under /var/log for Zookeeper logs. Both of the directory ownership need to be changed as zookeeper

mkdir /var/{lib,log}/zookeeper
chown -R zookeeper:zookeeper /var/{lib,log}/zookeeper

Create the server id for the ensemble. Each Zookeeper server should have a unique number in the myid file within the ensemble and should have a value between 1 and 255.

sudo ip a | grep '192.168.56.' | grep -Po 'inet \K[\d.]+' | grep -o '.$' > /var/lib/zookeeper/myid

Go to the conf folder under the Zookeeper home directory (location of the Zookeeper directory after Archive has been unzipped/extracted). By default, a sample conf file with name zoo_sample.cfg will be present in conf directory. You need to make a copy of it with name zoo.cfg as shown below, and edit new zoo.cfg as described across all three Ubuntu machines.

cd /opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg

and change zoo.cfg like below

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.<id>=<node1 ip or dns name>:2888:3888
server.<id>=<node2 ip or dns name>:2888:3888
server.<id>=<node3 ip or dns name>:2888:3888

Setup logging in log4.properties.

vi /opt/zookeeper/conf/log4j.properties

zookeeper.log.dir=/var/log/zookeeper
zookeeper.tracelog.dir=/var/log/zookeeper
log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE

Add following environment variables to the environment file.

sudo vi /etc/environment

ZOO_LOG_DIR="/var/lib/zookeeper"
SERVER_JVMFLAGS="-Xms256m -Xmx256m -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/var/lib/zookeeper/zookeeper_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=7 -XX:GCLogFileSize=10M"

Start zookeeper in all three nodes one by one, using the following command:

chown -R zookeeper:zookeeper /var/{lib,log}/zookeeper #just to be sure
/opt/zookeeper/bin/zkServer.sh start

Verify the Zookeeper Cluster and Ensemble
In Zookeeper ensemble out of three servers, one will be in leader mode and other two will be in follower mode. You can check the status by running the following commands.

/opt/zookeeper/bin/zkServer.sh status

Installing Kafka

Download and unpack Kafka package

wget https://downloads.apache.org/kafka/2.6.2/kafka_2.12-2.6.2.tgz -P /opt
tar -xf /opt/kafka_2.12-2.6.2.tgz -C /opt
ln -s /opt/kafka_2.12-2.6.2 /opt/kafka

Create kafka user and directories

useradd kafka
mkdir /var/{lib,log}/kafka
chown -R kafka:kafka /var/{lib,log}/kafka

Launching Kafka as a service on startup. For this, we will create a unit file in /etc/systemd/system directory

sudo touch /etc/systemd/system/kafka.service

[Unit]
Description=Apache Kafka
Requires=network.target
After=network.target

[Service]
Type=simple
EnvironmentFile=/opt/kafka/config/kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
User=kafka
Group=kafka
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target

Setup memory setting in environment file

sudo touch /opt/kafka/config/kafka

KAFKA_HEAP_OPTS="-Xms512m -Xmx512m"

Create server.properties file

#create a copy of existing propertie file
mv /opt/kafka/config/server.properties /opt/kafka/config/server.properties.orig

#broker.if must be a unic number
broker.id=1
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/lib/kafka
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
#zookeeper.connect need to provide ip address of all three zookeeper server
zookeeper.connect=<ip address>:2181,<ip address>:2181,<ip address>:2181/kafka
#ip address of server where installed kafka
listeners=PLAINTEXT://<ip address>:9092
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
delete.topic.enable = true

Add following environment to the environment file.

sudo vi /etc/environment

LOG_DIR="/var/log/kafka"

Finished setupping Kafka service

systemctl daemon-reload
systemctl enable kafka

Ensure Permission of Directories

chown -R kafka:kafka /var/{lib,log}/kafka

Starting Kafka services on each instance

systemctl start kafka
systemctl status kafka

Testing installation

#Create topics
/opt/kafka/bin/kafka-topics.sh --create --zookeeper 192.168.56.3:2181,192.168.56.4:2181,192.168.56.5:2181/kafka --replication-factor 3 --partitions 3 --topic test

#Describe topics
/opt/kafka/bin/kafka-topics.sh --describe --zookeeper 192.168.56.3:2181,192.168.56.4:2181,192.168.56.5:2181/kafka --topic test

#Let’s start publishing messages on test topic on one Kafka instance
/opt/kafka/bin/kafka-console-producer.sh --broker-list 192.168.56.3:9092,192.168.56.4:9092,192.168.56.5:9092 --topic test

#We will now create a subscriber on test topic and listen from the beginning of the topic on another Kafka instance
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server 192.168.56.3:9092,192.168.56.4:9092,192.168.56.5:9092 --topic test --from-beginning

Install Kafka - dryshliak/hadoop GitHub Wiki

Prerequisites

Installing Zookeeper

Installing Kafka

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️