HTTP(Hypertext Transfer Protocol) Request and Response
In this article we will be discussing the foundational protocol of the internet. During 1990 Tim Berners-Lee, was the first one to propose a project called world wide web. The original purpose was to have a fast and reliable information exchange between scientists working in different parts of the world.
HTTP(Hyper Text Transfer Protocol), was implemented first by Tim Berners-Lee in CERN.
HTTP protocol works in a client and server model like many other protocol. A web browser using which a request is initiated is called as a client and a web server software which respond's to that request is called as a server.
Below mentioned are some key points & terms to note about HTTP protocol, before we go ahead and understand a complete http request and response in practical.
- HTTP is an application layer protocol
- The default port if not mentioned in the request, is assumed as 80
- The hostname in the request is case insensitive
- World Wide Web Consortium and the Internet Engineering Task Force, both coordinates in the standardization of the HTTP protocol
- HTTP allows the improvement of its request and response with the help of intermediates in between(for example a gateway, a proxy, or a tunnel)
- The resources that can be requested by using HTTP protocol is made available with the help of a type of URI(Uniform Resource Identifier) called URL(Uniform Resource Locator).
- TCP (Transmission Control Protocol), is used to establish a connection to the application layer port 80 used by HTTP.(its not at all necessary to use port 80 for http connections, but if not explicitly mentioned in the URL, port 80 is assumed)
- A series of request and response in http is called as a session in HTTP
- HTTP version 0.9 was the first documented version of HTTP
- HTTP is a stateless protocol(which means each and every connection is independant of each other.)
For this tutorial i have setup two Machine's. One will act as a client, and the other will act as a server, containing apache version 2.2.3-22.
A default page which will show a "hello" message for this demosnstration is setup. We will initiate connection from the client to the server, and will capture all the traffic on the server that's coming from the client and see what's happening.
Below shown is the request i initiated from the client (a centos machine), with the help of wget.
[root@slashroot1 ~]# wget http://192.168.0.103 --15:51:09-- http://192.168.0.103/ Connecting to 192.168.0.103:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6 [text/html] Saving to: `index.html.7' 100%[=======================================>] 6 --.-K/s in 0s 15:51:09 (283 KB/s) - `index.html.7' saved [6/6]
Let's see what happened during the http request initiated by the client on the server side by capturing all network traffic using tcpdump.
[root@slashroot2 ~]# tcpdump -i eth0 -s0 -n -A host 192.168.0.104 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 05:36:32.781066 IP 192.168.0.104.52802 > 192.168.0.103.http: S 3151905201:3151905201(0) win 5840 <mss 1460,sackOK,timestamp 6458684 0,nop,wscale 4> E..<..@.@......h...g.B.P..A.........UR......... .b.<........ 05:36:32.781957 IP 192.168.0.103.http > 192.168.0.104.52802: S 3165795006:3165795006(0) ack 3151905202 win 5792 <mss 1460,sackOK,timestamp 6465077 6458684,nop,wscale 4> E..<..@.@......g...h.P.B..2...A......h......... .b.5.b.<.... 05:36:32.783047 IP 192.168.0.104.52802 > 192.168.0.103.http: . ack 1 win 365 <nop,nop,timestamp 6458685 6465077> E..4..@.@......h...g.B.P..A...2....m.d..... .b.=.b.5 05:36:32.784509 IP 192.168.0.104.52802 > 192.168.0.103.http: P 1:121(120) ack 1 win 365 <nop,nop,timestamp 6458685 6465077> E.....@.@......h...g.B.P..A...2....m....... .b.=.b.5GET / HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: 192.168.0.103 Connection: Keep-Alive 05:36:32.784551 IP 192.168.0.103.http > 192.168.0.104.52802: . ack 121 win 362 <nop,nop,timestamp 6465080 6458685> E..4..@.@.$....g...h.P.B..2...B*...j....... .b.8.b.= 05:36:32.786377 IP 192.168.0.103.http > 192.168.0.104.52802: P 1:268(267) ack 121 win 362 <nop,nop,timestamp 6465082 6458685> E..?..@.@.#....g...h.P.B..2...B*...j.Q..... .b.:.b.=HTTP/1.1 200 OK Date: Thu, 21 Feb 2013 10:36:32 GMT Server: Apache/2.2.3 (Red Hat) Last-Modified: Thu, 21 Feb 2013 09:47:39 GMT ETag: "377709-6-f71984c0" Accept-Ranges: bytes Content-Length: 6 Connection: close Content-Type: text/html; charset=UTF-8 hello 05:36:32.787006 IP 192.168.0.103.http > 192.168.0.104.52802: F 268:268(0) ack 121 win 362 <nop,nop,timestamp 6465082 6458685> E..4..@.@.$....g...h.P.B..3...B*...j....... .b.:.b.= 05:36:32.787622 IP 192.168.0.104.52802 > 192.168.0.103.http: . ack 268 win 432 <nop,nop,timestamp 6458689 6465082>
Let's examine the tcpdump output by a close examination. I have devided the tcpdump output of our sample http request to different steps below, for easy understanding.
Step 1: Establishing a TCP connection to the server by the client
Like any other application layer protocol, HTTP also works over TCP. So the first step done by the client in initiating a http connection is establishing a TCP connection with the http server.
You can clearly see that from the first three lines of our tcpdump output above. Those first three lines indicate's the initial TCP Handshake.
Read: TCP Three way handshake
If you see the tcpdump output, the first line is a connection initiate request with "S" flag set(so its a syn request). The second is a reply from the server to the client with "S & ACK" flag set. The third is an "ACK" reply from the client to complete the three way handshake.
Keep in mind that, we only completed the TCP connection establishment till here.
Step 2: Initiating a HTTP GET Request by the client to the server
The fourth line in the tcpdump output is from the client that initiates a http "GET" request.
Get requests are used by the clients for retrieving data from the server. If you closely see the GET request from the client, it reveals some information as below.
- what is the client requesting. For example in our case, its a simple "/". GET "/", such type of request tell's the server to retrieve the root directory(the default page of the website.)
- It also tells the http version the client is using. In our case its HTTP/1.0.
- The client inform's the server what type of user agent is being used for the request(type of browser, its version, whether its an application etc. This part is send by the client software used to initiate the connection. In our case its "Wget/1.10.2 (Red Hat modified)")
- Another important detail that the client informs the server is the type of data that the client is ready to accept. HTTP allows all MIME media types. In our example our client is ready to accept any kind of data hence its */*
- HOST field tells the hostname of the server, from which client is requesting the data.
- Most of the clients request a keep-alive type of connection from the server. Keep-alive is used to keep the tcp connection made by the client, so that the overhead of creating a tcp connection is reduced for subsequent requests. Although the client requests a keep-alive kind of connection from the server, its the server who decides whether to keep tcp connection active or not(based upon the server configuration).
All the above mentioned GET request parameter's are shown below (which is also shown in the tcpdump output above.)
05:36:32.784509 IP 192.168.0.104.52802 > 192.168.0.103.http: P 1:121(120) ack 1 win 365 <nop,nop,timestamp 6458685 6465077> E.....@.@......h...g.B.P..A...2....m....... .b.=.b.5GET / HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: 192.168.0.103 Connection: Keep-Alive
Web server logs will also report the same information that we saw above, Lets confirm it by seeing the get request from our web server logs.
[root@slashroot2 ~]# tail -f /var/log/httpd/access_log 192.168.0.104 - - [21/Feb/2013:05:34:14 -0500] "GET / HTTP/1.0" 200 6 "-" "Wget/1.10.2 (Red Hat modified)"
Step 3: HTTP Server Response to a HTTP GET Request
On getting the GET request from the client, the server responds, by revealing some information about itself, and metadata about the data asked by the client, along with the data.
- The server sends a status code(this status code informs the client software about the result of the request). There are many status codes used by HTTP. We will see some common status codes in HTTP later in this post. Discussing all the status codes used HTTP is beyond the scope of this article.
The server replies to the client with a status code of 200 and the http version it is using. In our case its HTTP /1.1.
Status code of 200 means the request has succeeded.
2. Date specifies the date and time at which the response was originated
3. Server filed in the response tell's the the server software that's being used(this field is modifiable in almost all web server's. This at times is a security breach as it reveals the server information to the client.)
In our case its Apache 2.2.3 (Red Hat)
4. Last modified field informs the client, the modified date and time of the data requested.
5. Etag is a string that identifies the modification of the data requested. This is used for caching. This improves the quality of web caching.
Etag is an identifier assigned by the web server to a specific version of the resource requested.
6. Another important factor that makes http a wonderful protocol is that you can request your required bytes of a resource.
For example, if you are downloading a file of 100M and in between the connection dropped, you can later resume the download by specifying the range of bytes from where to start the download in the GET request.
In our case the server informs the client in its response that it supports bytes range.
7. Content-Length specifies the size of the resource requested in bytes. In our case its just a "hello" printed. Hence its 6 bytes.
8. Connection: close , in our case indicates that the connection will be closed after completion of this particular request.
9. Content-Type specifies the type of the content. In our example its a simple Text/HTML.
10. finally comes the data after these header's.
All the above mentioned details from the tcpdump output in an http get response is shown below.
05:36:32.786377 IP 192.168.0.103.http > 192.168.0.104.52802: P 1:268(267) ack 121 win 362 <nop,nop,timestamp 6465082 6458685> E..?..@.@.#....g...h.P.B..2...B*...j.Q..... .b.:.b.=HTTP/1.1 200 OK Date: Thu, 21 Feb 2013 10:36:32 GMT Server: Apache/2.2.3 (Red Hat) Last-Modified: Thu, 21 Feb 2013 09:47:39 GMT ETag: "377709-6-f71984c0" Accept-Ranges: bytes Content-Length: 6 Connection: close Content-Type: text/html; charset=UTF-8 hello
HTTP Request Types:
We have already seen the GET request type in the above example, let's see some more types of HTTP request types.
HTTP Head Request
HTTP head request is very much similar to the GET request. Its the easiest method to know the complete details of the resource available on a particular URL, without downloading the entire data.
For example, if we use HEAD request in our above example, we will get all the header's in the response except our page containing the "Hello" message.
This kind of a request is used only to retrieve attributes of the data without the data. This can give you information about the resource by saving your bandwidth.
HTTP Post Request
POST request is mostly used to send data from the client to the server. Let's see an example of HTTP post request from the client to the server.
POST /my_data_send.php HTTP/1.1 Host: 192.168.0.105 User-Agent: ELinks/0.11.1 (textmode; Linux; 80x25-2) Referer: http://192.168.0.105/ Accept: */* Accept-Encoding: gzip Accept-Language: en Connection: Keep-Alive Content-Type: application/x-www-form-urlencoded Content-Length: 62 name=sarath&last=pillai&email=&telephone=&comments=
In the above shown post request the request is being send to "my_data-send.php". This data being send to the server will contain other header's as well that we saw during our GET request example.
The last line sends the exact data to the server.
HTTP Put request:
HTTP put request is very much similar to the post request. PUT request sends or creates a resource in the specified URI.
IF the resource is already present in that specified URI, it will update that URI, otherwise it will create the resource.
HTTP Delete Request:
HTTP delete request deletes a specified resource on a specified URI.
It's not at all advisable to configure a webserver for HTTP delete operation. However if you want to enable such functionalities, its better to that with a http POST operation using a web form, which intern will delete a resource.
HTTP Trace Request
HTTP trace request is used to trouble shoot http web pages.
For example, if suppose a web page is not getting loaded the way you want in your browser. Then in such cases http trace request can be used to retrieve the complete request that the server got from the client back to the client itself.
This kind of request kind is mostly disabled in most of the web server's. The main reason is that its very much similar to viewing the web server log of the request you send.
Some Famous Error Codes
Some of the widely know HTTP status codes are mentioned below.
Http 400 error code stands for "Bad request Type". the server responds with 400 error when the server receives a malformed request of any kind.
This status code error is shown by the server when the requested resource is forbidden.
In other words it can be said that you do not have access to the resource you are trying to access.
The most famous error code in http is 404. It indicates a not found error. The requested resource is not found on the server.
This error code of 500, is used to inform the client that some internal error happened in the server. But this error does not specify a specific problem in the web server.
Or say an unknown error in the web server.
Note: The main purpose of this article was to go through the steps involved in completing a HTTP request.
For more information on HTTP i will recommend reading the below links.
- Logstash Tutorial: Linux Central logging server
- Linux Network (TCP) Performance Tuning with Sysctl
- Linux Booting Process: A step by step tutorial for understanding Linux boot sequence
- DNS Root Servers: The most critical infrastructure on the internet
- Bash shell shortcuts that every system administrator must know