How to use Deployment - 3C-SCSU/Avatar GitHub Wiki
Using the deployment files will require having your server configured and either Podman or Kubernetes installed. For help with server configuration, Podman, or Kubernetes see other sections of the Wiki.
If you just want to use the files, follow the below section to run the files using Podman. An example section at the bottom walks through running and accessing the K8s.yaml.
Steps:
- Copy the
deployment
directory onto your server and then navigate inside it usingcd <path to where deployment was copied>
- First run the pvc.yaml with the command
podman play kube pvc.yaml
- Next deploy your pod with
podman play kube k8s.yaml
- Find the Jupyter token using
podman logs --tail 3 <container id for the Jupyter container>
- Access the VPS hosted Jupyter session by entering
<Your VPS IP>:10000/lab<token from step 4>
What the Deployment files do
These files will allow you to launch a Jupyter notebook instance with Pyspark and Tensorflow from your VPS and accessible is through a web browser. Multiple users can connect to the same instance, and notebooks created can be saved to persistent storage on the VPS which lasts between Pods. There is a read only data file for accessing data from the VPS within the notebooks. The Jupyter container is preconfigured with a connector for accessing a Google Cloud Storage data bucket.
How Deployment works
Yaml files
pvc.yaml
pvc.yaml creates a 1gi persistent volume named notebook
. This is used to store Jupyter notebook details between sessions, and if the pod crashes should allow the notebooks to be reloaded into a new instance of the pod. If you need to back up these notebooks, the default location is: /var/lib/containers/storage/volumes/notebook-pvc/_data
k8s.yaml
The k8s.yaml file loads two volume mounts and launches three containers.
-
Volume Mount 1:
./data
The first volume mount requires adata
directory relatively located in the same directory ask8s.yaml
If thedeployment
directory was copied from the repository theplaceholder_data
file can be replaced with any data desired in the Jupyter instance. This file is read only and cannot be altered from within the notebook. -
Volume Mount 2:
notebook
This is the volume created using pvc.yaml.
Containers
Container 1
This is a custom container which is a fork of the Jupyter Project notebooks. See the Dockerfile section for more details on customizing it.
This container provides a Jupyter Notebook instance, Tensorflow, Pyspark, and a Google GCP bucket connector. The container is configured to be accessed on port 10000.
Container 2
Container 2 runs Nginx on port 9000
Container 3
Container 3 contains Rust.
The Dockerfile
Documentation for Jupyter Project can be found here.
The container is stored on dockerhub. The Dockerfile is available in the DevOps directory. The Dockerfile can be customized and the container replaced with a custom container by changing the value on this line of the k8s.yaml
file to an alternative container: image: docker.io/jdknuds/jupyter_pyten:latest
.
If you wish to replace the GCP bucket with a different bucket, update the connector argument: ARG connector="gcs-connector-latest-hadoop3.jar"
and replace the value with an alternative connector. Then, update the wget commands url target: RUN wget -P "${SPARK_HOME}/jars/" "https://storage.googleapis.com/hadoop-lib/gcs/${connector}"
Delta Lake
Delta Lake can be added to the notebook and tested by entering these commands:
-
Install Delta Lake
!pip install delta-spark==2.3.0
-
Import delta and configure Delta
import pyspark from delta import * builder = pyspark.sql.SparkSession.builder.appName("MyApp") \ .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \ .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") spark = configure_spark_with_delta_pip(builder).getOrCreate()```
-
Write to a delta table
data = spark.range(0,5) data.write.format("delta".save("/tmp/dtela-table")
-
Read from a delta table
df = spark.read.format("delta").load("/tmp/delta-table") df.show()
Examples
Run the K8s.yaml
Obtain token
Click Link to access browser:
Update IP to VPS IP and port from 8888 to 10000
(Note: The following screenshots were taken from localhost, make sure to update the IP address for use on a VPS)