Web Basics — HTTP - odigity/academy GitHub Wiki


HTTP stands for HyperText Transfer Protocol, and is a text-based client/server network protocol, usually spoken over a TCP connection. It uses a simple request/response loop to let web clients (usually browsers) fetch web resources (usually HTML documents) from web servers.

(See: HTTP on MDN)

Request Messages

After establishing the connection, the browser will use HTTP to request a web resource by sending a request message. It will look something like this:

GET /docs/index.html HTTP/1.1
Host: www.example.com
Accept: image/gif, image/jpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
(blank line)

The Request Line

The first line, called the request line or start line, has three parts:

  • HTTP Method or Verb — In this case, the method is GET, which simple retrieves a resource. HTTP has a small set of methods that include POST, PUT, PATCH, DELETE, HEAD, and a few more obscure ones.
  • Request Target — This is the path component of the URL.
  • HTTP Version — Originally HTTP/1.0, later updated to HTTP/1.1. (HTTP/2 is out now, but is an advanced topic.)

In HTTP/1.0, the connection was closed after one request/response. HTTP/1.1 allows the web server to choose to keep the connection open for a short time so the client can send multiple requests without the overhead of establishing a new connection each time.

Request Headers

After the first line comes a series of optional headers which can provide more information about either the request or the client doing the requesting. Each header consists of a name, a colon (:), and a value, and is terminated by a newline character.

For example, the Accept-Language header informs the web server what languages the client (or user) is willing and able to consume.

(See: Headers on MDN)

The Host Header

The Host header is of special importance. When you tell the browser to request a web resource, it converts the domain name to an IP address, then connects to that IP address. However, multiple domains can point to the same IP address, and since the request line only specifies that path (not the domain), the web server might not know which web site you're trying to reach.

The host header was introduced (and made required) in HTTP/1.1 to solve this problem. The value is simply the domain and (optionally) the port. Example: www.google.com or www.google.com:80

Response Messages

The web server will respond (also using HTTP) with a response message with may or may not include either the requested resource or an error message.

If the request is successful, the response will look something like this:

HTTP/1.1 200 OK
Date: Sun, 18 Oct 2009 08:56:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Sat, 20 Nov 2004 07:16:26 GMT
ETag: "10000000565a5-2c-3e94b66c2e680"
Accept-Ranges: bytes
Content-Length: 44
Connection: close
Content-Type: text/html
X-Pad: avoid browser bug
(empty line)  
<html><body><h1>It works!</h1></body></html>

Status Line

The first line, called the status line, has three parts:

  • HTTP Version
  • Status Code — A three-digit number indicating a particular success or failure condition. The most common are 200 (successful GET) and 404 (resource not found error) (See: Status Codes on MDN)
  • Status Text — A text label for the status code that's easier to understand if you don't have the codes memorized. (200 -> OK, 404 -> Not Found, etc)

Response Headers

Like request headers. (They even share many of the same header names for the same purpose.)

Response Body

The last part of the response message (after the empty line) will contain either the requested resource or a longer error message / document.

Both are optional. For example, when you make an HTTP request using the HEAD method, it only returns headers with no body.

⚠️ **GitHub.com Fallback** ⚠️