spark‐operator - yeardream-de-project-team4/k8s_project GitHub Wiki
install
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install my-release spark-operator/spark-operator \
--namespace spark-operator \
--set webhook.enable=true \
--set image.repository=openlake/spark-operator \
--set image.tag=3.3.1 \
--create-namespace
Create Secret
<Your-MinIO-AccessKey>: minio 접속 아이디(admin)
<Your-MinIO-SecretKey>: minio 접속 비밀번호(password)
<Your-MinIO-Endpoint>: minio를 배포한 ec2 [private ip]:[port]
kubectl create secret generic minio-secret \
--from-literal=AWS_ACCESS_KEY_ID=<Your-MinIO-AccessKey> \
--from-literal=AWS_SECRET_ACCESS_KEY=<Your-MinIO-SecretKey> \
--from-literal=ENDPOINT=<Your-MinIO-Endpoint> \
--from-literal=AWS_REGION=us-east-1 \
--namespace spark-operator
spark application 작성 (src/main.py) 및 Docker Image build & push
git 의 main.py 참조해서 minio 접속관련 부분 수정
FROM openlake/spark-py:3.3.1
USER root
WORKDIR /app
RUN pip3 install pyspark==3.3.1
RUN pip3 install numpy
COPY src/*.py .
docker login
docker build . -t [dokcer hub id]/[image name]:[tag name]
docker push [dokcer hub id]/[image name]:[tag name]
submit job
sparkjob-minio.yaml 작성 minio 접속 관련 부분, image 등 수정
# 작업 제출
kubectl apply -f sparkjob-minio.yaml
# 작업 확인
kubectl get all,sparkapplications -n spark-operator
UI
kubectl edit svc -n spark-operator [service]
type: ClusterIP -> NodePort
# 작업이 종료되기 전까지 해당 포트로 접근해 진행상황을 볼 수 있다