Hypertext Transfer Protocol - softwareconstruction240/softwareconstruction GitHub Wiki
π₯οΈ Slides
π₯οΈ Lecture Videos
π Optional Reading: MDN An overview of HTTP
βYou affect the world by what you browseβ
β Tim Berners-Lee, (Source: Answers for Young People)
The Uniform Resource Locator (URL) represents the location of a web resource. A web resource can be anything, such as a web page, font, image, video stream, database record, or JSON object. It can also be completely ephemeral, such as a visitation counter, or gaming session.
Looking at the different parts of a URL is a good way to understand what it represents. Here is an example URL that represents a query to BYU for cities that contain pro
using secure HTTP.
The URL syntax uses the following convention. Notice the delimiting punctuation between the parts of the URL. Most parts of the URL are optional. The only ones that are required are the scheme, and the domain name.
<scheme>://<domain name>:<port>/<path>?<parameters>#<anchor>
Part | Example | Meaning |
---|---|---|
Scheme | https | The protocol required to ask for the resource. For web applications, this is usually HTTPS. But it could be any internet protocol such as FTP or MAILTO. |
Domain name | byu.edu | The domain name that owns the resource represented by the URL. |
Port | 3000 | The port specifies the numbered network port used to connect to the domain server. Lower number ports are reserved for common internet protocols, higher number ports can be used for any purpose. The default port is 80 if the scheme is HTTP, or 443 if the scheme is HTTPS. |
Path | /school/byu/user/8014 | The path to the resource on the domain. The resource does not have to physically be located on the file system with this path. It can be a logical path representing endpoint parameters, a database table, or an object schema. |
Parameters | filter=names&highlight=intro,summary | The parameters represent a list of key value pairs. Usually it provides additional qualifiers on the resource represented by the path. This might be a filter on the returned resource or how to highlight the resource. The parameters are also sometimes called the query string. |
Anchor | summary | The anchor usually represents a sub-location in the resource. For HTML pages this represents a request for the browser to automatically scroll to the element with an ID that matches the anchor. The anchor is also sometimes called the hash, or fragment ID. |
Technically you can also provide a user name and password before the domain name. This was used historically to authenticate access, but for security reasons this is deprecated. However, you will still see this convention for URLs that represent database connection strings.
Hypertext Transfer Protocol (HTTP
) is how the web talks. When a web browser makes a request to a web server it does it using the HTTP protocol. The purpose of this instruction is help you master the high level internals of HTTP. Just like becoming fluent in a foreign language makes a visit to another country more enjoyable, understanding how to speak HTTP helps you communicate effectively when talking on the web.
When a web client (e.g. a web browser) and a web server talk they exchange HTTP requests and responses. The browser will make an HTTP request and the server will generate an HTTP response. You can see the HTTP exchange by using the browser's debugger or by using a console tool like curl
. For example, in your console you can use curl
to make the following request.
curl -v -s http://info.cern.ch/hypertext/WWW/Helping.html
The HTTP request for the above command would look like the following.
GET /hypertext/WWW/Helping.html HTTP/1.1
Host: info.cern.ch
Accept: text/html
An HTTP request has this general syntax.
<verb> <url path, parameters, anchor> <version>
[<header key: value>]*
[
<body>
]
The first line of the HTTP request contains the verb
of the request, followed by the path, parameters, and anchor of the URL, and finally the version of HTTP being used. The following lines are optional headers that are defined by key value pairs. After the headers you have an optional body. The body start is delimited from the headers with two new lines.
In the above example, we are asking to GET
a resource found at the path /hypertext/WWW/Helping.html
. The version used by the request is HTTP/1.1
. This is followed by two headers. The first specifies the requested host (i.e. domain name). The second specifies what type of resources the client will accept. The resource type is always a MIME type as defined by internet governing body IANA. In this case we are asking for HTML.
The response to the above request looks like this.
HTTP/1.1 200 OK
Date: Tue, 06 Dec 2022 21:54:42 GMT
Server: Apache
Last-Modified: Thu, 29 Oct 1992 11:15:20 GMT
ETag: "5f0-28f29422b8200"
Accept-Ranges: bytes
Content-Length: 1520
Connection: close
Content-Type: text/html
<TITLE>Helping -- /WWW</TITLE>
<NEXTID 7>
<H1>How can I help?</H1>There are lots of ways you can help if you are interested in seeing
the <A NAME=4 HREF=TheProject.html>web</A> grow and be even more useful...
An HTTP response has the following syntax.
<version> <status code> <status string>
[<header key: value>]*
[
<body>
]
You can see that the response syntax is similar to the request syntax. The major difference is that the first line represents the version and the status of the response.
Understanding the meaning of the common HTTP verbs, status codes, and headers is important for you to understand, as you will use them in developing a web application. Take some time to internalize the following common values.
There are several verbs that describe what the HTTP request is asking for. The list below only describes the most common ones.
Verb | Meaning |
---|---|
GET | Get the requested resource. This can represent a request to get a single resource or a resource representing a list of resources. |
POST | Create a new resource. The body of the request contains the resource. The response should include a unique ID of the newly created resource. |
PUT | Update a resource. Either the URL path, HTTP header, or body must contain the unique ID of the resource being updated. The body of the request should contain the updated resource. The body of the response may contain the resulting updated resource. |
DELETE | Delete a resource. Either the URL path or HTTP header must contain the unique ID of the resource to delete. |
OPTIONS | Get metadata about a resource. Usually only HTTP headers are returned. The resource itself is not returned. |
It is important that you use the standard HTTP status codes in your HTTP responses so that the client of a request can know how to interpret the response. The codes are partitioned into five blocks.
- 1xx - Informational.
- 2xx - Success.
- 3xx - Redirect to some other location, or that the previously cached resource is still valid.
- 4xx - Client errors. The request is invalid.
- 5xx - Server errors. The request cannot be satisfied due to an error on the server.
Within those ranges here are some of the more common codes. See the MDN documentation for a full description of status codes.
Code | Text | Meaning |
---|---|---|
100 | Continue | The service is working on the request |
200 | Success | The requested resource was found and returned as appropriate. |
201 | Created | The request was successful and a new resource was created. |
204 | No Content | The request was successful but no resource is returned. |
304 | Not Modified | The cached version of the resource is still valid. |
307 | Permanent redirect | The resource is no longer at the requested location. The new location is specified in the response location header. |
308 | Temporary redirect | The resource is temporarily located at a different location. The temporary location is specified in the response location header. |
400 | Bad request | The request was malformed or invalid. |
401 | Unauthorized | The request did not provide a valid authentication token. |
403 | Forbidden | The provided authentication token is not authorized for the resource. |
404 | Not found | An unknown resource was requested. |
408 | Request timeout | The request takes too long. |
409 | Conflict | The provided resource represents an out of date version of the resource. |
418 | I'm a teapot | The service refuses to brew coffee in a teapot. |
429 | Too many requests | The client is making too many requests in too short of a time period. |
500 | Internal server error | The server failed to properly process the request. |
503 | Service unavailable | The server is temporarily down. The client should try again with an exponential back off. |
π Optional Reading: MDN HTTP headers
HTTP headers specify metadata about a request or response. This includes things like how to handle security, caching, data formats, and cookies. Some common headers that you will use include the following.
Header | Example | Meaning |
---|---|---|
Authorization | Bearer bGciOiJIUzI1NiIsI | A token that authorized the user making the request. |
Accept | image/* | What content format the client accepts. This may include wildcards. |
Content-Type | text/html; charset=utf-8 | The format of the response content. These are described using standard MIME types. |
Cookie | SessionID=39s8cgj34; csrftoken=9dck2 | Key value pairs that are generated by the server and stored on the client. |
Host | info.cern.ch | The domain name of the server. This is required in all requests. |
Origin | cs260.click | Identifies the origin that caused the request. A host may only allow requests from specific origins. |
Access-Control-Allow-Origin | https://cs260.click | Server response of what origins can make a request. This may include a wildcard. |
Content-Length | 368 | The number of bytes contained in the response. |
Cache-Control | public, max-age=604800 | Tells the client how it can cache the response. |
User-Agent | Mozilla/5.0 (Macintosh) | The client application making the request. |
The format of the body of an HTTP request or response is defined by the Content-Type
header. For example, it may be HTML text (text/html), a binary image format (image/png), JSON (application/json), or JavaScript (text/javascript). A client may specify what formats it accepts using the accept
header.
- Internet basics: IP addresses, domain names, port numbers
- Web basics: URLs, the HTTP protocol (Headers, methods, and body)
- URL schema
- π₯ Chess Server Review (3:39) - [transcript]
- π₯ HTTP Overview (10:42) - [transcript]
- π₯ HTTP GET Requests (23:29) - [transcript]
- π₯ HTTP POST Requests (8:43) - [transcript]
- π₯ HTTP Methods (2:49) - [transcript]