Web Basics — HTTP - odigity/academy GitHub Wiki
HTTP stands for HyperText Transfer Protocol, and is a text-based client/server network protocol, usually spoken over a TCP connection. It uses a simple request/response loop to let web clients (usually browsers) fetch web resources (usually HTML documents) from web servers.
(See: HTTP on MDN)
After establishing the connection, the browser will use HTTP to request a web resource by sending a request message. It will look something like this:
GET /docs/index.html HTTP/1.1
Host: www.example.com
Accept: image/gif, image/jpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
(blank line)
The first line, called the request line or start line, has three parts:
- HTTP Method or Verb — In this case, the method is
GET
, which simple retrieves a resource. HTTP has a small set of methods that includePOST
,PUT
,PATCH
,DELETE
,HEAD
, and a few more obscure ones. - Request Target — This is the path component of the URL.
- HTTP Version — Originally
HTTP/1.0
, later updated toHTTP/1.1
. (HTTP/2 is out now, but is an advanced topic.)
In HTTP/1.0, the connection was closed after one request/response. HTTP/1.1 allows the web server to choose to keep the connection open for a short time so the client can send multiple requests without the overhead of establishing a new connection each time.
After the first line comes a series of optional headers which can provide more information about either the request or the client doing the requesting. Each header consists of a name, a colon (:), and a value, and is terminated by a newline character.
For example, the Accept-Language
header informs the web server what languages the client (or user) is willing and able to consume.
(See: Headers on MDN)
The Host
header is of special importance.
When you tell the browser to request a web resource, it converts the domain name to an IP address, then connects to that IP address.
However, multiple domains can point to the same IP address, and since the request line only specifies that path (not the domain), the web server might not know which web site you're trying to reach.
The host header was introduced (and made required) in HTTP/1.1 to solve this problem.
The value is simply the domain and (optionally) the port.
Example: www.google.com
or www.google.com:80
The web server will respond (also using HTTP) with a response message with may or may not include either the requested resource or an error message.
If the request is successful, the response will look something like this:
HTTP/1.1 200 OK
Date: Sun, 18 Oct 2009 08:56:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Sat, 20 Nov 2004 07:16:26 GMT
ETag: "10000000565a5-2c-3e94b66c2e680"
Accept-Ranges: bytes
Content-Length: 44
Connection: close
Content-Type: text/html
X-Pad: avoid browser bug
(empty line)
<html><body><h1>It works!</h1></body></html>
The first line, called the status line, has three parts:
- HTTP Version
- Status Code — A three-digit number indicating a particular success or failure condition. The most common are 200 (successful GET) and 404 (resource not found error) (See: Status Codes on MDN)
- Status Text — A text label for the status code that's easier to understand if you don't have the codes memorized. (200 ->
OK
, 404 ->Not Found
, etc)
Like request headers. (They even share many of the same header names for the same purpose.)
The last part of the response message (after the empty line) will contain either the requested resource or a longer error message / document.
Both are optional.
For example, when you make an HTTP request using the HEAD
method, it only returns headers with no body.