spank_oodproxy - BYUHPC/oodproxy GitHub Wiki
BYU's oodproxy is a system designed to provide secure port forwarding for jobs running on a Slurm cluster. It enables users to access network ports open on a compute node, which are typically isolated from direct user access.
This SPANK plugin:
- Creates certificates for use in mutual TLS (mTLS) authentication between the oodproxy server and a program such as stunnel that will be launched inside of the job
- Gathers a list of network ports opened inside the job. This ensures that users can only use the proxy to connect to their own processes.
This is only one piece of the puzzle. You need other components that will be documented soon.
The system consists of the following components:
-
SPANK Plugin (
spank_oodproxy.c): Integrates with Slurm to handle certificate generation and management during job lifecycle -
Certificate Generation Script (
oodproxy_gencerts.sh): Creates the necessary TLS certificates for mTLS -
Port Registration Daemon (
oodproxy_regd.sh): Discovers and registers open listening ports within the job - Proxy Server (separate project to be documented later): Uses the generated certificates to establish secure connections to the job
┌────────────────────┐
│ │
│ External Client │
│ │
└──────────┬─────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ │
│ Proxy Server │
│ (Uses generated certificates to establish mTLS connections to the job) │
│ │
└────────────────────────────────────────────┬───────────────────────────────┘
│
|
┌─────────────────────────────────────────────────────────+───────────────────────────────┐
│ Compute Node | │
│ ▼ │
│ ┌─────────────────┐ ┌────────────────┐ ┌────────────┐ ┌───────────────┐ │
│ │ │ │ │ │ │ │ │ │
│ │ SPANK Plugin ├──────► Generate ├──────► stunnel ├──────► Process with │ │
│ │ │ │ TLS Certs │ │ │ │ Open Port │ │
│ └────────┬────────┘ └────────────────┘ └────────────┘ └───────────────┘ │
│ │ │
│ │ │
│ ┌────────▼────────┐ ┌────────────────┐ │
│ │ │ │ │ │
│ │ Registration ├──────► Allowed │ │
│ │ Daemon │ │ Destinations │ │
│ │ │ │ │ │
│ └─────────────────┘ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
The SPANK plugin integrates with Slurm and performs the following functions:
-
Initialization:
- Parses configuration parameters from
plugstack.conf - Sets up necessary environment variables
- Creates directory structure for TLS certificates
- Parses configuration parameters from
-
Certificate Management:
- Invokes the certificate generation script during job startup
- Ensures certificates are cleaned up during job termination
-
Port Registration:
- Launches the registration daemon to discover and record listening ports
- Creates and manages the
allowed_destinationsfile, which lists accessible services
-
Security:
- Manages file permissions to ensure secure access to certificates
This script creates the necessary TLS certificates for mTLS authentication:
- Generates a Certificate Authority (CA) key and certificate
- Creates server and client certificates signed by the CA
- Copies the certificates to locations accessible to the job and the proxy server
- Sets proper ownership and permissions on the certificate files
The script generates:
-
ca.key: Certificate Authority private key -
ca.crt: Certificate Authority certificate -
server.keyandserver.crt: Server-side key and certificate -
client.keyandclient.crt: Client-side key and certificate -
ca+client.crt: Combined CA and client certificates
The registration daemon is responsible for:
- Waiting for the job to signal that it has started its service(s)
- Discovering listening TCP and UDP ports within the job's process namespace
- Writing a list of host:port combinations to the
allowed_destinationsfile
It uses lsof to detect open ports and generates entries for each host and port combination, making them available to the proxy server.
- Slurm with SPANK plugin support
- Build tools including gcc and the Slurm development headers
- openssl client commands
- lsof
Run make then copy the .so to the proper place for Slurm. Or copy it wherever you would like and point plugstack.conf to it.
Add the following to your Slurm plugstack.conf:
required /path/to/spank_oodproxy.so registration_daemon=/path/to/oodproxy_regd.sh oodproxy_root=/some/shared/fs/oodproxy/jobs gencerts=/path/to/oodproxy_gencerts.sh PATH=/usr/bin:/usr/sbin
Configuration parameters:
-
registration_daemon: Path to the registration daemon script -
oodproxy_root: Root directory for storing certificates and job information -
gencerts: Path to the certificate generation script -
webserver_gid: GID of the webserver user (can be numeric or group name) -
PATH: Environment PATH to use when executing scripts
The plugin creates the following directory structure:
/some/shared/fs/oodproxy/ # oodproxy_root
└── <user_id>/ # Per-user directory
└── <job_id>/ # Per-job directory
└── allowed_destinations # List of accessible host:port combinations
Additionally, it creates a temporary directory in /tmp/.oodproxy-XXXXXX/ to store the TLS certificates.
To enable OODProxy for a job, use the --oodproxy-register=1 option with sbatch:
sbatch --oodproxy-register=1 job_script.shJobs need to signal when they are ready for port registration by writing to the file descriptor specified in the OODPROXY_REG_READY_FD environment variable:
# Start your service
python -m http.server 8888 &
# Signal that the service is ready for registration
if [[ -n "$OODPROXY_REG_READY_FD" ]]; then
echo >&$OODPROXY_REG_READY_FD
fiThat causes the registration daemon to survey the running processes from the user in that job to see what ports are open. It then writes that out to an allowed_destinations file. That file is later used by the proxy to determine if a destination is allowed when a user wants to connect to it. This ensures that users can only contact ports that they themselves opened.
The job can access the TLS certificates in the directory specified by the OODPROXY_DIR environment variable:
# Access certificate paths
CA_CERT="${OODPROXY_DIR}/ca.crt"
SERVER_CERT="${OODPROXY_DIR}/server.crt"
SERVER_KEY="${OODPROXY_DIR}/server.key"The system implements several security measures:
- mTLS Authentication: Both the client and server verify each other's identity
- Limited Access: Only registered ports are accessible through the proxy
- Unique Certificates: Each job gets its own unique set of certificates
- Cleanup on Exit: Certificates and registration information are removed when jobs end
- Certificates are generated when the job starts
- They are valid for 365 days, intended to be at least as long as the longest job (configurable in
oodproxy_gencerts.sh) - They are destroyed when the job ends
- Each certificate has a unique UUID-based CN
While not provided in the code, OODProxy is designed to work with TLS termination programs like stunnel.
This is an example of how to use the certs in stunnel. Configurations vary wildly depending on whether sd_listen_fds is used, etc.
[server]
cert = ${OODPROXY_DIR}/server.crt
key = ${OODPROXY_DIR}/server.key
CAfile = ${OODPROXY_DIR}/ca+client.crt
requireCert = yes
verifyChain = yes
verifyPeer = yes
accept = 0.0.0.0:8443
connect = 127.0.0.1:8888
-
Certificate Generation Failures:
- Check permissions on the
oodproxy_rootdirectory - Ensure the
opensslcommand is available in the configured PATH
- Check permissions on the
-
Port Registration Issues:
- Verify the job is correctly signaling readiness
-
Directory Cleanup Failures:
- NFS-related issues may prevent immediate directory removal
-
GID Mismatch
- Make sure the
webserver_gidmatches that of the web server that the user's browser talks to
- Make sure the
- IPv6 Support: Currently limited to IPv4 addresses
-
Port Scanning: Relies on
lsoffor port discovery, which might miss some cases (no known cases yet)
Rather than tie this implementation to Slurm, it would be very feasible to write a registration daemon that accepts connections over a Unix socket. SO_PEERCRED could be used by the daemon to check who is on the other side then perform the same registration function. Why not do it that way now? Because! I started off using SPANK and this seemed easy enough. If someone else prefers a different approach, go for it. I'm not set on the SPANK plugin approach.
We don't have a great name for this proxy solution. "oodproxy" seems decent enough, but we don't want to confuse anyone and make them think this is an official OOD project so, for now, we're referring to it as BYU's oodproxy.