CGI - Jibus22/webserv GitHub Wiki
In computing, Common Gateway Interface (CGI) is an interface specification that enables web servers to execute an external program, typically to process user requests.
https://en.wikipedia.org/wiki/Common_Gateway_Interface
⭐ https://datatracker.ietf.org/doc/html/rfc3875
http://www.wijata.com/cgi/cgispec.html#4.0
https://www.ibm.com/docs/en/i/7.3?topic=information-environment-variables
⭐ https://www.jmarshall.com/easy/cgi/cgi_footnotes.html#otherenv
https://computer.howstuffworks.com/cgi.htm
https://www.tutorialspoint.com/python/python_cgi_programming.htm
https://perso.liris.cnrs.fr/lionel.medini/enseignement/M1IF03/Tutoriels/Tutoriel_CGI_SSI.pdf
https://docs.python.org/3/library/cgi.html
https://stuff.mit.edu/afs/sipb/machine/anxiety-closet/apachessl/apache_1.3.27/htdocs/manual/cgi_path.html.fr
http://www.cgi101.com/book/ch3/text.html
👍 https://help.superhosting.bg/en/cgi-common-gateway-interface-fastcgi.html
The server communicates with the CGI script in two ways :
- Environment variables
- STDIN/STDOUT
Most of important HTTP headers for the called script are set in an environment of meta-variables by the server. The rest, like the body, is sent to STDIN so the script process can read it with fd0.
The whole response from the CGI script is sent to STDOUT so the server has to retrieve HTTP header, plus the optionnal body, from it.
Here is a list of which has to be implemented :
Meta-Variable | Data origin | Notes |
---|---|---|
CONTENT_LENGHT | Request | The server MUST set this meta-variable if and only if the request is accompanied by a message body entity. The CONTENT_LENGTH value must reflect the length of the message-body |
CONTENT_TYPE | Request | The server MUST set this meta-variable if an HTTP Content-Type field is present in the client request header. |
GATEWAY_INTERFACE | Static | It MUST be set to the dialect of CGI being used by the server to communicate with the script. Example: CGI/1.1 |
PATH_INFO | Request | Extra "path" information. It's possible to pass extra info to your script in the URL, after the filename of the CGI script. For example, calling the URL http://www.myhost.com/mypath/myscript.cgi/path/info/here will set PATH_INFO to "/path/info/here". Commonly used for path-like data, but you can use it for anything. |
PATH_TRANSLATED | Req/Conf | The PATH_TRANSLATED variable is derived by taking the PATH_INFO value, parsing it as a local URI in its own right, and performing any virtual-to-physical translation appropriate to map it onto the server's document repository structure. More info here: rfc3876 |
QUERY_STRING | Request | When information is sent using a method of GET, this variable contains the information in a query that follows the "?". The string is coded in the standard URL format of changing spaces to "+" and encoding special characters with "%xx" hexadecimal encoding. The CGI program must decode this information. |
REQUEST_METHOD | Request | Contains the method (as specified with the METHOD attribute in an HTML form) that is used to send the request. Example: GET |
REMOTE_ADDR | Core | Contains the IP address of the remote host (web browser) that is making the request, if available. Coming from accept() - Example: 10.10.2.3 |
SCRIPT_NAME | Request | The path part of the URL that points to the script being executed. It should include the leading slash. Example: /cgi-bin/hello.pgm |
SERVER_NAME | Conf/Core | Contains the server host name or IP address of the server. Example: 10.9.8.7 |
SERVER_PORT | Core | Contains the port number to which the client request was sent. |
SERVER_PROTOCOL | Static | HTTP/1.1 |
SERVER_SOFTWARE | Static | Webserv |
a | <scheme> | "://" | <server-name> | ":" | <server-port> | <script-path> | <extra-path> | "?" | <query-string> |
---|---|---|---|---|---|---|---|---|---|
b | <SERVER_PROTOCOL> | "://" | <SERVER_NAME> | ":" | <SERVER_PORT> | <SCRIPT_NAME> | <PATH_INFO> | "?" | <QUERY_STRING> |
c | <http> | "://" | <website.org> | ":" | <80> | </cgi-bin/search.cgi> | <> | "?" | <searchTerm=HTML> |
c | <http> | "://" | <website.org> | ":" | <80> | </web> | <> | "?" | <q=test> |
c | <http> | "://" | <website.org> | ":" | <80> | </mypath/myscript.cgi> | </path/info/here> | "?" | <v=4hD5kl2n0> |
a: script-URI
b: corresponding meta-variable
c: examples
A CGI script is either script or binary which uses CGI meta-variables & stdin to interpret or/and process data in HTTP way. For example if the REQUEST_METHOD is GET
, it would retrieve targeted data from a database or write to it. It would be same with POST
except the query string would be into the body, as the URI query-string is limited to 1024 bytes.
Each request to a CGI script starts the executable then ends it whereas fastCGI is a server with its program 'always' running.
Nginx natively handles fastCGI and we can configure it to connect with, for example, the fastCGI php-fpm server for each '.php' match.
Our webserv program only handles CGI and has to execute any script which matches with the declared CGI file extension. Syntax looks like this : cgi_ext .php /usr/local/bin/php-cgi
, cgi_ext .py /usr/bin/python
or even cgi_ext .pl /usr/bin/perl
.
cgi_ext
directive make the file extension match with its executable launcher absolute path, so that there is no need to shebang the script nor chmod +x file.py
it.
With this given configuration :
server {
listen 0.0.0.0:8080;
server_name website;
location / {
root /var/www/website;
cgi_ext .py /usr/bin/python;
}
location /foo {
root /var/www/website/users/;
cgi_ext .php /usr/local/bin/php-cgi;
}
}
Let's say we have the folowing URL: http://website.org:8080/foo/test.php/this%2eis%2epath%3binfo?query=string
And so this http target : /foo/test.php/this%2eis%2epath%3binfo?query=string
.
-
PATH_INFO is :
/this.is.the.path;info
An internal URI is constructed from the scheme, server location and the
URL-encoded PATH_INFO: http://website.org:8080/this%2eis%2epath%3binfo?query=string
This would then be translated to a location in the server's document repository, perhaps a filesystem path something like this
-
PATH_TRANSLATED is :
/var/www/website/this.is.the.path;info
However, if PATH_INFO were /foo/path/info
, then PATH_TRANSLATED would be mapped in this case to /var/www/website/users/path/info
-
SCRIPT_NAME could be :
/foo/test.php
or/var/www/website/cgi-bin/test.php
? Perhaps 1st solution ...
When webserv has to run a python, perl or php script, a call to its according external process is made with execve()
, with the file being executed as argument and CGI meta-variables as environment. Then a communication throught 2 pipes (full-duplex) has to be established between the parent process (webserv) and the child one (CGI script). As execve()
kills its calling process, it has to be called in a child. As the CGI script read stdin and write to stdout, these file descriptors must be hijacked to our pipes.
Therefore the execution line looks like this: pipe()
- fork()
- dup2()
- execve()
http://www.man-linux-magique.net/man2/fork.html
http://pwet.fr/man/linux/appels_systemes/pipe/
http://www.man-linux-magique.net/man7/pipe.html
http://www.man-linux-magique.net/man2/dup2.html
https://www.man7.org/linux/man-pages/man2/execve.2.html
http://www.man-linux-magique.net/man2/wait.html
http://www.man-linux-magique.net/man2/waitpid.html
http://tzimmermann.org/2017/08/17/file-descriptors-during-fork-and-exec/
http://tzimmermann.org/2017/09/01/the-internals-of-unix-pipes-and-fifos/
http://www.rozmichelle.com/pipes-forks-dups/
-
One important thing about
fork()
:- it copies all parent data to another memory map (this is why forking can be heavy). File descriptors & filedes tables are also copied, referencing to the same file description. In that way, if the child read 5 first bytes of an open file then the parent read 5 bytes after that, the parent would have read the 5 last bytes, not the 5 same bytes as the child. In the other hand if the parent closes its file descriptor, it won't be closed in the child process. To resume: the child get its own copy of parent file descriptors, which point to the same file description.
-
One important thing about
execve()
:- The callee process replaces the caller process. We can imagine that nothing connects them together as the caller process is exited. But the nice thing is that the file descriptors remains. In other words, they are inherited by the callee process. Thus, if 10 fildes are opened into the caller process before calling the callee, this process will inherit of these. When it is not wanted it is called file descriptor leaking. To avoid this, the O_CLOEXEC is here to make them automatically closed at any
exec*()
call. But in our case this behaviour is our chance to communicate with it, as the callee will have inherited file descriptor we want it to inherit, I name : the pipe()d and dup()ed file descriptors.
- The callee process replaces the caller process. We can imagine that nothing connects them together as the caller process is exited. But the nice thing is that the file descriptors remains. In other words, they are inherited by the callee process. Thus, if 10 fildes are opened into the caller process before calling the callee, this process will inherit of these. When it is not wanted it is called file descriptor leaking. To avoid this, the O_CLOEXEC is here to make them automatically closed at any
-
How to hijack the CGI script i/o :
-
dup2()
help us here. We have this prototype :int dup2(int oldfd, int newfd);
. This is FD duplication. Here,newfd
will be a copy ofoldfd
, as they will both point to the same file description. So if one of them is closed, the file description (& the corresponding file buffer) will remains as there will still be an open FD pointing to it. An important behaviour to know about this function is ifnewfd
exist (already points to an existing file description), it will be closed before being duplicated fromoldfd
.
So, before calling the CGI script, we can close stdin & stdout and make those one (0 & 1) point to the file descriptions we want, in other word make fd0 & fd1 be respectively a copy of our parent-to->child pipe fd-write-end AND our child-to->parent pipe fd-read-end. Thus, the commands would look like this:dup2(prt_to_chd[1], STDIN)
,dup2(chd_to_prt[0], STDOUT)
o
-
📩 CGI Response
A script MUST always provide a non-empty response
The response comprises a message-header and a message-body, separated by a blank line. The message-header contains one or more header fields. The body may be NULL.
The script MUST return one of either a document response, a local redirect response or a client redirect (with optional document) response.
- document response
- The CGI script can return a document to the user in a document response, with an optional error code indicating the success status of the response.
The script MUST return aContent-Type
header field.
A Status header field is optional, and status 200 'OK' is assumed (added by the server) if it is omitted.
- local redirect response
- The CGI script can return a URI path and query-string ('local-pathquery') for a local resource in a
Location
header field. This indicates to the server that it should reprocess the request using the path specified. The script MUST NOT return any other header fields or a message-body.
- client redirect response
- The CGI script can return an absolute URI path in a
Location
header field, to indicate to the client that it should reprocess the request using the URI specified. The script MUST not provide any other header fields.
- client redirect response with document
- Same as 3 but with an attached document and the Status header field MUST be supplied and MUST contain a status value of 302 'Found'