API Reference - skizzerz/python-sandbox GitHub Wiki
This page details the sandbox API; that is, the API used for the parent and child processes to communicate. For the reference PHP parent, please check the PHP API page.
The parent process is responsible for executing the sandbox
binary as a child process. It should pass along two additional open file descriptors (3 and 4), which are used by the sandbox to communicate with the parent. The sandbox program will send RPCs to the parent on fd 4 and look for their response on fd 3. In addition to these file descriptors, the parent will pass a number of command line arguments to the sandbox, these are path_to_python memory_limit_bytes cpu_limit_secs
.
The path_to_python
argument should be the string path to the python executable, according to the child. This path does not need to actually exist on the filesystem, and it is the parent's responsibility to somehow make this file path actually lead to a real python executable. Note that python itself bases its own paths around this path, so if you specify /usr/bin/python, then python will be looking for its libraries in either /usr/lib64 if python is 64-bit or /usr/lib if python is 32-bit (so in other words, it looks in ../lib64 or ../lib). The parent should ensure that these directories lead to python's libraries so that it initializes properly.
The memory_limit_bytes
and cpu_limit_secs
are used to curtail the resource usage of the sandbox. A value of 0 can be specified to use default values, which are 200MiB of memory and 5 seconds of CPU time. Note that the memory limit is on the address space of the sandbox, not on the size of the resident set.
Example: ./sandbox /usr/bin/python 0 0
In addition to passing along fds 3 and 4 and specifying the command line arguments, the parent must also specify a small handful of environment variables:
- PYTHONPATH must include the directory of the sandbox's
lib
directory; in the reference PHP implementation this is /usr/lib/sandbox. These directories should be according to what the sandbox sees, they do not need to actually exist on the filesystem. - LD_PRELOAD must be the real location of
libsbpreload.so
, which is generally in the same directory as thesandbox
binary. - It is recommended to define PYTHONDONTWRITEBYTECODE to prevent python from attempting to write .pyc files (if your filesystem abstraction is read-only), and PYTHONNOUSERSITE to prevent python from adding the current user's site-packages directory to PYTHONPATH.
The sandbox will attempt to load two files from the parent and execute them. These files should comprise any initialization routines the application needs to perform in the sandbox and subsequently the user code. Both of these files are opened via relative paths, however the sandbox does not change directories during init on its own so these files can be provided in the current working directory.
-
init.py
contains application initialization routines. If this file does not exist, no such routines will be executed. This file is executed after the sandbox is started, but before the sandbox sends itscomplete_init()
call. -
main.py
contains the user code and is run aftercomplete_init()
is sent. The sandbox will error if this file does not exist.
After the sandbox has been started, it will start communicating to the parent process via fds 3 and 4 as described above. The sandbox is incapable of receiving unsolicited messages from the parent, and every message it sends must have exactly one reply. The format generally uses JSON, however during initialization some non-JSON messages must be passed from the parent to the child. Each RPC message forms a single line and is terminated by the newline character ("\n"). The maximum line length for responses is 65535 bytes, including the terminating newline. There is no defined maximum line length for requests, the parent should choose some sane cutoff and also be prepared to receive maliciously-constructed requests that are larger than that cutoff.
The RPC request from the child to the parent is always a JSON document and contains the following keys:
- ns (int): The namespace of the request. System calls use namespace 0, internal sandbox calls use namespace 1, and namespace 2 and beyond are reserved for application-specific uses. The application may also opt to use other data such as strings instead of ints for this field, however any messages sent by the sandbox itself will use ints.
- name (string): The string name of the method to call. Internal sandbox RPCs will only ever use lowercase alphanumeric characters and underscores for this field, however application code is free to use whatever.
- args (list): This is a list of arguments to call the method with; a list will always be sent even if there are no arguments (in that case it will be an empty list). The arguments themselves can be any arbitrary data, it is up to the application to make sense of them. For internal calls, consult the documentation below for what the arguments are for any given call.
The request may additionally contain the following keys:
- raw (bool): If true, the response is expected to be a space-separated list of data rather than a JSON object. This key is specified during initialization of the JSON library on the sandbox, so failing to honor this parameter will result in the sandbox failing as it is unable to parse a JSON reply at this point. See the Raw Response section for more details on the formatting. If this key is not specified, it should be treated as if it were false. Only the namespace 0 system calls
open
,stat
,read
, andclose
have defined raw semantics; the parent may treat any other call asking for raw responses as malicious and terminate the sandbox as the sandbox is not equipped to handle raw responses for any calls other than those 4 specified above, or it may return a regular JSON response even though raw was specified.
Examples:
{"ns": 0, "name": "open", "args": ["/usr/lib64/python3.5/site.py", 524288]}
{"ns": 0, "name": "read", "args": [5, 8], "raw": true}
The examples above are spaced out for readability, however the sandbox will generate the most compact representation possible (e.g. without spaces). Depending on the version of json-c the sandbox was compiled with, forward slashes may or may not be escaped. The parent is expected to be able to handle both escaped and unescaped forward slashes (any parser that conforms to the JSON specification should do this).
The RPC response from the parent to the child is generally a JSON document unless the child specified the "raw" key in the request and that key is boolean true. In that event, please see the Raw Response section. The remainder of this section assumes we are sending a regular (not raw) response.
The response document must contain the following keys:
- code (int): Success indicator or error code. For system calls (namespace 0), a negative code generally indicates an error whereas a positive code or 0 indicates success (depending on the syscall). For all other codes, 0 indicates success and any other value indicates error -- a positive value being which type of python exception to raise and a negative value indicating a fatal error that should abort the sandbox with the given exit code. Please see the Raising Exceptions section for information on what number corresponds to what python exception.
The response document may contain the following keys:
- errno (int): An error number to set as the errno variable for system calls (namespace 0) or when instructed to raise an OSError (all other namespaces). If instructed to raise an OSError, this key is required.
- data (any): The response data for the RPC. This field is optional for some system calls (namespace 0) and required for all other calls. If code indicates an error, this field should contain a descriptive error message as a string. Some system calls require this field as well, and the documentation for them describes what format the data should take.
- base64 (bool): If specified and has a value of true, this indicates that data is a base-64 encoded string. The data key is required and must be a string if this is specified and true. If not specified, a value of false should be assumed. The child will handle decoding the data before passing it off to any user code.
Examples:
{"code": -1, "errno": 2, "data": "No such file or directory"}
{"code": 0, "data": {"custom": ["data", "goes", "here"]}}
{"code": 20, "data": "MTIzNDU2Nzg5MDEyMzQ1Njc4OTA=", "base64": true}
A raw response still occupies a single line and is terminated with a newline character. However, it uses a simple space-separated format instead of a JSON document, namely code errno data
. The code and errno parameters have the same semantics as for regular responses, and the exact format of data is described in the below sections detailing individual calls. Only the namespace 0 system calls open
, stat
, read
, and close
have defined raw semantics; the parent may treat any other call asking for raw responses as malicious and terminate the sandbox as the sandbox is not equipped to handle raw responses for any calls other than those 4 specified above, or it may return a regular JSON response even though raw was specified.
Examples:
0 0 5 1033 8630 1 0 0 265 0 1463166355 1463166355 1463166355 4096 0
4 0 AuW8yg==
The parent can specify that a python exception be raised by specifying a positive number as the code
in the response. Unless otherwise specified, the data
parameter must be a string containing an error message. Not every exception may be defined depending on the version of python, and some exceptions may have different semantics. Check the python documentation for more information.
Code | Exception |
---|---|
1 | ImportError1 |
2 | IndexError |
3 | KeyError |
4 | MemoryError |
5 | NotImplementedError |
6 | OSError2 |
7 | OverflowError |
8 | RuntimeError |
9 | StopIteration |
10 | StopAsyncIteration |
11 | SyntaxError3 |
12 | TypeError |
13 | ValueError |
14 | ZeroDivisionError |
1 The data
key may either be a string or an object with a name
, path
, and message
key (whose values are strings)
2 The errno
key must be defined. The data
key may either be a string or an object with a strerror
, filename
, and filename2
keys (filename2 and strerror are optional in this case, all values are strings).
3 The data
key must be an object with the filename
, lineno
, offset
, text
, and message
keys (all of which exception lineno and offset are strings; lineno and offset being ints).
The application can define additional exceptions to be raised by extending the sandbox.EXCEPTION_MAP dict. It is recommended that custom exceptions begin numbering at 100 or later to leave room for future expansion of official exceptions.
This section documents all system calls and internal sandbox calls that a parent process must implement. System calls use namespace 0 whereas internal sandbox calls use namespace 1.
For additional details on how system calls operate and what parameters they take, consult the man pages, e.g. man 2 open
or man 2 fcntl
.
open(string path, int flags, int mode = 0)
The return code
should be a file descriptor number on success (5 or greater, as the child already has fds 0-4 opened) and -1 upon error (with errno
set to the appropriate error number). data
need not be defined. The flags and mode parameters operate the same as they do in open(2)
, in that they are bitfields of various flags (such as O_RDONLY or O_CLOEXEC) and modes (such as S_IRUSR and S_IRGRP).
This system call can be called with the raw
parameter, in which case it should return code errno
(errno is required even if code indicates success).
fcntl(int fd, int cmd, mixed arg = NULL)
The return code
should be -1 on error (with errno
set to the appropriate error number). Only certain commands need be implemented, these are detailed below. The application can implement more than these if it wishes. Unless otherwise specified in the command description, data
is ignored and need not be defined. At the moment, no commands are implemented on the child side that require data
. Further down, the various values arg
can take are defined; if a command is not listed then it does not take an arg
.
The application must implement the following:
-
F_GETFD
: Returns the flags of the file descriptor incode
. This must be correct with regards to the close-on-exec flag matching what a previous open() or fcntl() call specified. -
F_GETFL
: Returns the mode of the file descriptor incode
. -
F_DUPFD_CLOEXEC
: Returns the new file descriptor incode
.
It is strongly recommended that the application additionally implement the following:
-
F_SETFD
: Sets the flags of the file descriptor, returning acode
of 0 to indicate success. -
F_SETFL
: Sets the mode of the file descriptor, returning acode
of 0 to indicate success. -
F_DUPFD
: Returns the new file descriptor incode
.
Calls other than those described above may not be fully implemented in the sandbox, particularly commands that would require any data to be returned. As such, they need not be implemented on the parent side either, even though the sandbox will pass such calls along.
The arg
key is an int for the following commands:
F_DUPFD
F_DUPFD_CLOEXEC
F_SETFD
F_SETFL
F_SETOWN
F_SETSIG
F_SETLEASE
F_NOTIFY
F_SETPIPE_SZ
F_ADD_SEALS
The arg
key is an object of the following form for the following commands:
{"l_type": int, "l_whence": int, "l_start": int, "l_len": int, "l_pid": int}
F_SETLK
F_SETLKW
F_GETLK
F_OFD_SETLK
F_OFD_SETLKW
F_OFD_GETLK
The arg
key is an object of the following form for the following commands:
{"type": int, "pid": int}
F_GETOWN_EX
F_SETOWN_EX
close(int fd)
Closes the file descriptor, returning a code
of 0 upon success and -1 on error (with errno
set to an appropriate error number). data
is ignored and need not be defined.
This system call can be called with the raw
parameter, in which case it should return code errno
(errno is required even if code indicates success).
read(int fd, int count)
Reads count
bytes from the file descriptor, returning the number of bytes actually read in code
upon success and -1 on error (with errno
set to an appropriate error number). data
should be a string containing the read bytes, ensure that they are encoded properly (either with escape codes or by base64-encoding data) should it not contain valid JSON characters. If the file descriptor is at EOF, this should return 0 and data
should be an empty string; it must not be an error to attempt to read an EOF fd. The application is free to return less than count bytes, as long as code
reflects the actual number of bytes read.
This system call can be called with the raw
parameter, in which case it should return code errno data
. errno is required but ignored if code indicates success. data is required but ignored if code indicates an error. data must be base64-encoded if the read is successful and cannot contain spaces. For a raw read, the data string after base64-encoding cannot be longer than 65535 bytes.
stat(string path)
Returns stat info for the path. code
should be 0 on success and -1 on error (with errno
set appropriately). The data
key should be an object with the following fields:
{"st_dev": int, "st_ino": int, "st_mode": int, "st_nlink": int, "st_uid": int, "st_gid": int, "st_rdev": int, "st_size": int, "st_atime": int, "st_mtime": int, "st_ctime": int, "st_blksize": int, "st_blocks": int}
The application is free to fill in the fields however it deems desirable (for example, it may choose to hardcode the returned uid/gid), however the st_size field should reflect the actual size of the file for reading, as Python allocates buffers based on the returned size when preparing to read a file. See stat(2)
for a description of these fields.
This system call can be called with the raw
parameter, in which case it should return code errno st_dev st_ino st_mode st_nlink st_uid st_gid st_rdev st_size st_atime st_mtime st_ctime st_blksize st_blocks
. All fields are required regardless of success or failure.
lstat(string path)
Same return values as stat above, however this cannot be called with the raw
parameter.
fstat(int fd)
Same return values as stat above, however this cannot be called with the raw
parameter.
readlink(string path)
On success, code
is the length of the data
in bytes, and data
is a string with the new pathname. On failure, code
is -1 and an errno
is appropriately set.
openat(int dirfd, string pathname, int flags, int mode = 0)
On success, code
is the new file descriptor. On failure, code
is -1 and errno
is appropriately set. The data
key is ignored.
getdents(int fd, int count, int direntsz)
Returns at most count bytes worth of directory entries from the directory fd. The direntsz parameter states how many bytes are taken up by each directory entry not counting the path name itself, so that the application can correctly compute the number of entries it can actually return. If there are no more directory entries to list, code
should be 0. On failure, code
should be -1 and and errno
should be appropriately set. On success, code
should be any positive value and data
should be a list of objects which take the following form:
{"d_ino": int, "d_name": string, "d_type": int}
The number of results in the array should result in entries totaling count or less bytes being returned (where each entry is calculated to contain direntsz + strlen(d_name)
bytes; this is not the same as how many bytes each JSON object takes up in the response). If using a language where strings are terminated with a trailing null byte, that null byte should not be included in the length of d_name. See getdents(2)
for more information on the values of the object fields.
lseek(int fd, int offset, int whence)
Returns 0 or a positive code
on success per lseek(2)
and -1 on failure (with errno
set appropriately). The data
key is ignored.
dup(int oldfd)
Returns the new file descriptor in code
on success and -1 on failure (with errno
set appropriately). The data
key is ignored.
complete_init(void)
This call is meant to signal to the parent that all sandbox initialization has been completed. The parent application can use this as a signal to drop any privileges it may have needed during the initialization process but does not need when running actual user code. Even if the parent does not utilize this call, it must still be implemented. The complete_init call will be called with an empty args
array. The data
key is ignored but still must be specified. As this is not a system call, it is possible to raise a python exception by specifying the correct code
.