spark‐operator - yeardream-de-project-team4/k8s_project GitHub Wiki

install

helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator

helm install my-release spark-operator/spark-operator \
--namespace spark-operator \
--set webhook.enable=true \
--set image.repository=openlake/spark-operator \
--set image.tag=3.3.1 \
--create-namespace

Create Secret

<Your-MinIO-AccessKey>: minio 접속 아이디(admin)
<Your-MinIO-SecretKey>: minio 접속 비밀번호(password)
<Your-MinIO-Endpoint>: minio를 배포한 ec2 [private ip]:[port]
kubectl create secret generic minio-secret \
--from-literal=AWS_ACCESS_KEY_ID=<Your-MinIO-AccessKey> \
--from-literal=AWS_SECRET_ACCESS_KEY=<Your-MinIO-SecretKey> \
--from-literal=ENDPOINT=<Your-MinIO-Endpoint> \
--from-literal=AWS_REGION=us-east-1 \
--namespace spark-operator

spark application 작성 (src/main.py) 및 Docker Image build & push

git 의 main.py 참조해서 minio 접속관련 부분 수정

FROM openlake/spark-py:3.3.1
USER root
WORKDIR /app
RUN pip3 install pyspark==3.3.1
RUN pip3 install numpy
COPY src/*.py .
docker login
docker build . -t [dokcer hub id]/[image name]:[tag name]
docker push [dokcer hub id]/[image name]:[tag name]

submit job

sparkjob-minio.yaml 작성 minio 접속 관련 부분, image 등 수정

# 작업 제출
kubectl apply -f sparkjob-minio.yaml

# 작업 확인
kubectl get all,sparkapplications -n spark-operator

UI

kubectl edit svc -n spark-operator [service]
type: ClusterIP -> NodePort
# 작업이 종료되기 전까지 해당 포트로 접근해 진행상황을 볼 수 있다