spark & kaspacore - mata-elang-stable/mataelang-platform GitHub Wiki
Item | Value |
---|---|
Spark IP address | 172.16.2.50 |
Hadoop IP address (network interface) | 172.16.2.50 (*Must be the same host as Spark) |
Hadoop IP address (docker0 interface) | 172.17.0.1 |
Kafka IP address | 172.16.2.40 |
Hadoop user | ubuntu |
✅ Ubuntu 20.04 LTS installed and updated with the following command.
sudo apt update && sudo apt -y upgrade
✅ Time Zone and NTP already set.
✅ Docker 20.10 or later installed with the following command.
sudo apt -y install docker.io
✅ Docker Compose 2.13 or later installed with the following command.
sudo curl -L "https://github.com/docker/compose/releases/download/v2.13.0/docker-compose-$(uname -s)-$(uname -m)"\
-o /usr/bin/docker-compose && sudo chmod +x /usr/bin/docker-compose
git clone https://github.com/mata-elang-stable/spark-asset.git ~/spark
.env
to set the environment variables.
mv ~/spark/.env.example ~/spark/.env
nano ~/spark/.env
Configuration
🔑 Change ubuntu
of HADOOP_USER_NAME
to your user account if necessary. (e.g. hadoop
)
🔑 Change ubuntu
of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop
)
HADOOP_USER_NAME=ubuntu
SPARK_EVENTLOG_DIR=hdfs://172.17.0.1:9000/user/ubuntu/spark/spark-events
SPARK_APP_JAR_PATH=hdfs://172.17.0.1:9000/user/ubuntu/kaspacore/files/kaspacore.jar
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://172.17.0.1:9000/user/ubuntu/spark/spark-events"
🔑 Change ubuntu
of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop
)
hdfs dfs -mkdir -p hdfs://localhost:9000/user/ubuntu/spark/spark-events
app.properties
.
mv ~/spark/conf/app.properties.example ~/spark/conf/app.properties
nano ~/spark/conf/app.properties
Configuration
🔑 Change ubuntu
of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop
)
🔑 Change TIMEZONE
to match your time zone if necessary. (e.g. Asia/Jakarta
)
🔑 Change KAFKA_BOOTSTRAP_SERVERS
to the Kafka server IP address and port number. (e.g. 172.16.2.40:9093
)
SPARK_MASTER=spark://spark-master:7077
SPARK_CHECKPOINT_PATH=hdfs://172.17.0.1:9000/user/ubuntu/kafka-checkpoint
TIMEZONE=UTC
KAFKA_BOOTSTRAP_SERVERS=172.17.0.1:9093
KAFKA_INPUT_STARTING_OFFSETS=latest
SENSOR_STREAM_INPUT_TOPIC=sensor_events
SENSOR_STREAM_OUTPUT_TOPIC=sensor_events_with_geoip
MAXMIND_DB_PATH=hdfs://172.17.0.1:9000/user/ubuntu/kaspacore/files/GeoLite2-City.mmdb
MAXMIND_DB_FILENAME=GeoLite2-City.mmdb
spark-defaults.conf
.
mv ~/spark/conf/spark-defaults.conf.example ~/spark/conf/spark-defaults.conf
Click here if you want to edit the configuration.
spark-defaults.conf
.
nano ~/spark/conf/spark-defaults.conf
The contents of the configuration file are as follows:
# Worker
spark.worker.cleanup.enabled=true
spark.worker.cleanup.interval=1800
spark.worker.cleanup.appDataTtl=14400
# History Server
spark.history.ui.port=18080
spark.history.retainedApplications=10
spark.history.fs.update.interval=10s
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
# App Configuration
spark.master=spark://spark-master:7077
spark.eventLog.enabled=true
log4j2.properties
.
mv ~/spark/conf/log4j2.properties.example ~/spark/conf/log4j2.properties
Click here if you want to edit the configuration.
nano ~/spark/conf/log4j2.properties
The contents of the configuration file are as follows:
log4j.rootLogger=ERROR, console
# set the log level for these components
log4j.logger.com.test=DEBUG
log4j.logger.org=ERROR
log4j.logger.org.apache.spark=ERROR
log4j.logger.org.spark-project=ERROR
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.io.netty=ERROR
log4j.logger.org.apache.zookeeper=ERROR
# add a ConsoleAppender to the logger stdout to write to the console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
# use a simple message format
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
docker-compose.yaml
.
nano ~/spark/docker-compose.yaml
Configuration
🔑 Change services.spark-worker.deploy.replicas
to increase the number of workers as needed.
services:
spark-worker:
environment:
<<: *spark-worker-default-env
SPARK_WORKER_CORES: 2
SPARK_WORKER_MEMORY: 4G
deploy:
mode: replicated
replicas: 2
sudo docker-compose -f ~/spark/docker-compose.yaml up -d
sudo docker-compose -f ~/spark/docker-compose.yaml ps -a
Result: It takes about 30 seconds for the spark-submit-* services to successfully complete the registration process.
spark-spark-historyserver-1 "/opt/entrypoint.sh …" spark-historyserver running 0.0.0.0:18080->18080/tcp, :::18080->18080/tcp
spark-spark-master-1 "/opt/entrypoint.sh …" spark-master running 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
spark-spark-submit-aggr-1 "/opt/entrypoint.sh …" spark-submit-aggr exited (0)
spark-spark-submit-enrich-1 "/opt/entrypoint.sh …" spark-submit-enrich exited (0)
spark-spark-worker-1 "/opt/entrypoint.sh …" spark-worker running
spark-spark-worker-2 "/opt/entrypoint.sh …" spark-worker running
- URL:
http://<SPARK_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:8080/
Click to view screen image
- URL:
http://<SPARK_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:18080/
Click to view screen image
Click to show commands
✅ Show service status
sudo docker-compose -f ~/spark/docker-compose.yaml ps -a
Result
spark-spark-historyserver-1 "/opt/entrypoint.sh …" spark-historyserver running 0.0.0.0:18080->18080/tcp, :::18080->18080/tcp
spark-spark-master-1 "/opt/entrypoint.sh …" spark-master running 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
spark-spark-submit-aggr-1 "/opt/entrypoint.sh …" spark-submit-aggr exited (0)
spark-spark-submit-enrich-1 "/opt/entrypoint.sh …" spark-submit-enrich exited (0)
spark-spark-worker-1 "/opt/entrypoint.sh …" spark-worker running
spark-spark-worker-2 "/opt/entrypoint.sh …" spark-worker running
✅ Start services
sudo docker-compose -f ~/spark/docker-compose.yaml up -d
✅ Stop services (and remove containers)
sudo docker-compose -f ~/spark/docker-compose.yaml down
✅ Stop services (and keep containers)
sudo docker-compose -f ~/spark/docker-compose.yaml stop
✅ Restart services
sudo docker-compose -f ~/spark/docker-compose.yaml restart
✅ Build Mata Elang Spark image.
- Please prepare another host to build the image.
# update packages and install docker
sudo apt update && sudo apt -y upgrade
sudo apt -y install docker.io
# download Spark
wget -P ~/ https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3-scala2.13.tgz
tar -xzf ~/spark-3.3.1-bin-hadoop3-scala2.13.tgz -C ~/
# build docker image
cd ~/spark-3.3.1-bin-hadoop3-scala2.13
sudo docker build -t <REPOSITORY>/<IMAGE>[:TAG] -f kubernetes/dockerfiles/spark/Dockerfile .
# push image to your Docker Hub
sudo docker login -u <USERNAME>
Password:
sudo docker push <REPOSITORY>/<IMAGE>[:TAG]
✅ Show environment variables
sudo docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' spark-spark-master-1
sudo docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' spark-spark-worker-1
sudo docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' spark-spark-submit-enrich-1
sudo docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' spark-spark-submit-aggr-1
sudo docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' spark-spark-historyserver-1
✅ Show the loaded configurations
sudo docker-compose -f ~/spark/docker-compose.yaml exec spark-master cat /opt/spark/conf/app.properties
sudo docker-compose -f ~/spark/docker-compose.yaml exec spark-master cat /opt/spark/conf/spark-defaults.conf
sudo docker-compose -f ~/spark/docker-compose.yaml exec spark-master cat /opt/spark/conf/log4j2.properties
✅ Show Spark log
sudo docker-compose -f ~/spark/docker-compose.yaml logs spark-master
sudo docker-compose -f ~/spark/docker-compose.yaml logs spark-worker
sudo docker-compose -f ~/spark/docker-compose.yaml logs spark-submit-aggr
sudo docker-compose -f ~/spark/docker-compose.yaml logs spark-submit-enrich
sudo docker-compose -f ~/spark/docker-compose.yaml logs spark-historyserver
✅ Show Spark version
sudo docker-compose -f ~/spark/docker-compose.yaml exec spark-master /opt/spark/bin/spark-shell --version
✅ Show Docker version
sudo docker version
✅ Show Docker Compose version
docker-compose version
✅ Show OS version
cat /etc/os-release