In computing, Common Gateway Interface (CGI) is an interface specification that enables web servers to execute an external program, typically to process user requests.

📖 Documentation

https://en.wikipedia.org/wiki/Common_Gateway_Interface
⭐ https://datatracker.ietf.org/doc/html/rfc3875
http://www.wijata.com/cgi/cgispec.html#4.0
https://www.ibm.com/docs/en/i/7.3?topic=information-environment-variables
⭐ https://www.jmarshall.com/easy/cgi/cgi_footnotes.html#otherenv
https://computer.howstuffworks.com/cgi.htm
https://www.tutorialspoint.com/python/python_cgi_programming.htm
https://perso.liris.cnrs.fr/lionel.medini/enseignement/M1IF03/Tutoriels/Tutoriel_CGI_SSI.pdf
https://docs.python.org/3/library/cgi.html
https://stuff.mit.edu/afs/sipb/machine/anxiety-closet/apachessl/apache_1.3.27/htdocs/manual/cgi_path.html.fr
http://www.cgi101.com/book/ch3/text.html
👍 https://help.superhosting.bg/en/cgi-common-gateway-interface-fastcgi.html

📝 Notes

The server communicates with the CGI script in two ways :

Environment variables
STDIN/STDOUT

Most of important HTTP headers for the called script are set in an environment of meta-variables by the server. The rest, like the body, is sent to STDIN so the script process can read it with fd0.

The whole response from the CGI script is sent to STDOUT so the server has to retrieve HTTP header, plus the optionnal body, from it.

Here is a list of which has to be implemented :

Meta-Variables table

Meta-Variable	Data origin	Notes
CONTENT_LENGHT	Request	The server MUST set this meta-variable if and only if the request is accompanied by a message body entity. The CONTENT_LENGTH value must reflect the length of the message-body
CONTENT_TYPE	Request	The server MUST set this meta-variable if an HTTP Content-Type field is present in the client request header.
GATEWAY_INTERFACE	Static	It MUST be set to the dialect of CGI being used by the server to communicate with the script. Example: CGI/1.1
PATH_INFO	Request	Extra "path" information. It's possible to pass extra info to your script in the URL, after the filename of the CGI script. For example, calling the URL `http://www.myhost.com/mypath/myscript.cgi/path/info/here` will set PATH_INFO to "/path/info/here". Commonly used for path-like data, but you can use it for anything.
PATH_TRANSLATED	Req/Conf	The PATH_TRANSLATED variable is derived by taking the PATH_INFO value, parsing it as a local URI in its own right, and performing any virtual-to-physical translation appropriate to map it onto the server's document repository structure. More info here: rfc3876
QUERY_STRING	Request	When information is sent using a method of GET, this variable contains the information in a query that follows the "?". The string is coded in the standard URL format of changing spaces to "+" and encoding special characters with "%xx" hexadecimal encoding. The CGI program must decode this information.
REQUEST_METHOD	Request	Contains the method (as specified with the METHOD attribute in an HTML form) that is used to send the request. Example: GET
REMOTE_ADDR	Core	Contains the IP address of the remote host (web browser) that is making the request, if available. Coming from accept() - Example: 10.10.2.3
SCRIPT_NAME	Request	The path part of the URL that points to the script being executed. It should include the leading slash. Example: /cgi-bin/hello.pgm
SERVER_NAME	Conf/Core	Contains the server host name or IP address of the server. Example: 10.9.8.7
SERVER_PORT	Core	Contains the port number to which the client request was sent.
SERVER_PROTOCOL	Static	HTTP/1.1
SERVER_SOFTWARE	Static	Webserv

script-URI / meta-variables mapping

a	<scheme>	"://"	<server-name>	":"	<server-port>	<script-path>	<extra-path>	"?"	<query-string>
b	<SERVER_PROTOCOL>	"://"	<SERVER_NAME>	":"	<SERVER_PORT>	<SCRIPT_NAME>	<PATH_INFO>	"?"	<QUERY_STRING>
c	<http>	"://"	<website.org>	":"	<80>	</cgi-bin/search.cgi>	<>	"?"	<searchTerm=HTML>
c	<http>	"://"	<website.org>	":"	<80>	</web>	<>	"?"	<q=test>
c	<http>	"://"	<website.org>	":"	<80>	</mypath/myscript.cgi>	</path/info/here>	"?"	<v=4hD5kl2n0>

a: script-URI
b: corresponding meta-variable
c: examples

CGI & fastCGI

A CGI script is either script or binary which uses CGI meta-variables & stdin to interpret or/and process data in HTTP way. For example if the REQUEST_METHOD is GET, it would retrieve targeted data from a database or write to it. It would be same with POST except the query string would be into the body, as the URI query-string is limited to 1024 bytes.
Each request to a CGI script starts the executable then ends it whereas fastCGI is a server with its program 'always' running.
Nginx natively handles fastCGI and we can configure it to connect with, for example, the fastCGI php-fpm server for each '.php' match.
Our webserv program only handles CGI and has to execute any script which matches with the declared CGI file extension. Syntax looks like this : cgi_ext .php /usr/local/bin/php-cgi, cgi_ext .py /usr/bin/python or even cgi_ext .pl /usr/bin/perl .
cgi_ext directive make the file extension match with its executable launcher absolute path, so that there is no need to shebang the script nor chmod +x file.py it.

Some examples

With this given configuration :

server {
  listen 0.0.0.0:8080;
  server_name website;

  location / {
    root /var/www/website;
    cgi_ext  .py /usr/bin/python;
  }

  location /foo {
    root /var/www/website/users/;
    cgi_ext  .php /usr/local/bin/php-cgi;
  }
}

Let's say we have the folowing URL: http://website.org:8080/foo/test.php/this%2eis%2epath%3binfo?query=string
And so this http target : /foo/test.php/this%2eis%2epath%3binfo?query=string.

PATH_INFO is : /this.is.the.path;info

An internal URI is constructed from the scheme, server location and the
URL-encoded PATH_INFO: http://website.org:8080/this%2eis%2epath%3binfo?query=string
This would then be translated to a location in the server's document repository, perhaps a filesystem path something like this ▶️

PATH_TRANSLATED is : /var/www/website/this.is.the.path;info

However, if PATH_INFO were /foo/path/info, then PATH_TRANSLATED would be mapped in this case to /var/www/website/users/path/info

SCRIPT_NAME could be : /foo/test.php or /var/www/website/cgi-bin/test.php ? Perhaps 1st solution ...

⚙️ Execution

When webserv has to run a python, perl or php script, a call to its according external process is made with execve(), with the file being executed as argument and CGI meta-variables as environment. Then a communication throught 2 pipes (full-duplex) has to be established between the parent process (webserv) and the child one (CGI script). As execve() kills its calling process, it has to be called in a child. As the CGI script read stdin and write to stdout, these file descriptors must be hijacked to our pipes.
Therefore the execution line looks like this: pipe() - fork() - dup2() - execve()

📖 Documentation

http://www.man-linux-magique.net/man2/fork.html
http://pwet.fr/man/linux/appels_systemes/pipe/
http://www.man-linux-magique.net/man7/pipe.html
http://www.man-linux-magique.net/man2/dup2.html
https://www.man7.org/linux/man-pages/man2/execve.2.html
http://www.man-linux-magique.net/man2/wait.html
http://www.man-linux-magique.net/man2/waitpid.html
http://tzimmermann.org/2017/08/17/file-descriptors-during-fork-and-exec/
http://tzimmermann.org/2017/09/01/the-internals-of-unix-pipes-and-fifos/
http://www.rozmichelle.com/pipes-forks-dups/

📝 Notes

One important thing about fork():
- it copies all parent data to another memory map (this is why forking can be heavy). File descriptors & filedes tables are also copied, referencing to the same file description. In that way, if the child read 5 first bytes of an open file then the parent read 5 bytes after that, the parent would have read the 5 last bytes, not the 5 same bytes as the child. In the other hand if the parent closes its file descriptor, it won't be closed in the child process. To resume: the child get its own copy of parent file descriptors, which point to the same file description.
One important thing about execve():
- The callee process replaces the caller process. We can imagine that nothing connects them together as the caller process is exited. But the nice thing is that the file descriptors remains. In other words, they are inherited by the callee process. Thus, if 10 fildes are opened into the caller process before calling the callee, this process will inherit of these. When it is not wanted it is called file descriptor leaking. To avoid this, the O_CLOEXEC is here to make them automatically closed at any exec*() call. But in our case this behaviour is our chance to communicate with it, as the callee will have inherited file descriptor we want it to inherit, I name : the pipe()d and dup()ed file descriptors.
How to hijack the CGI script i/o :
- dup2() help us here. We have this prototype : int dup2(int oldfd, int newfd);. This is FD duplication. Here, newfd will be a copy of oldfd, as they will both point to the same file description. So if one of them is closed, the file description (& the corresponding file buffer) will remains as there will still be an open FD pointing to it. An important behaviour to know about this function is if newfd exist (already points to an existing file description), it will be closed before being duplicated from oldfd.
  So, before calling the CGI script, we can close stdin & stdout and make those one (0 & 1) point to the file descriptions we want, in other word make fd0 & fd1 be respectively a copy of our parent-to->child pipe fd-write-end AND our child-to->parent pipe fd-read-end. Thus, the commands would look like this: dup2(prt_to_chd[1], STDIN), dup2(chd_to_prt[0], STDOUT)o

📩 CGI Response

A script MUST always provide a non-empty response

📝 Notes

The response comprises a message-header and a message-body, separated by a blank line. The message-header contains one or more header fields. The body may be NULL.
The script MUST return one of either a document response, a local redirect response or a client redirect (with optional document) response.

document response

The CGI script can return a document to the user in a document response, with an optional error code indicating the success status of the response.
The script MUST return a Content-Type header field.
A Status header field is optional, and status 200 'OK' is assumed (added by the server) if it is omitted.

local redirect response

The CGI script can return a URI path and query-string ('local-pathquery') for a local resource in a Location header field. This indicates to the server that it should reprocess the request using the path specified. The script MUST NOT return any other header fields or a message-body.

client redirect response

The CGI script can return an absolute URI path in a Location header field, to indicate to the client that it should reprocess the request using the URI specified. The script MUST not provide any other header fields.

client redirect response with document

Same as 3 but with an attached document and the Status header field MUST be supplied and MUST contain a status value of 302 'Found'

CGI - Jibus22/webserv GitHub Wiki

📖 Documentation

📝 Notes

Meta-Variables table

script-URI / meta-variables mapping

CGI & fastCGI

Some examples

⚙️ Execution

📖 Documentation

📝 Notes

📩 CGI Response

📝 Notes

⚠️ GitHub.com Fallback ⚠️

CGI - Jibus22/webserv GitHub Wiki

📖 Documentation

📝 Notes

Meta-Variables table

script-URI / meta-variables mapping

CGI & fastCGI

Some examples

⚙️ Execution

📖 Documentation

📝 Notes

📩 CGI Response

📝 Notes

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️