注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Hao的博客

I'm on my way……

 
 
 

日志

 
 
 
 

HTTP Headers【转】  

2010-07-03 20:32:46|  分类: 杂七杂八 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

This artical is partially copied from http://net.tutsplus.com/tutorials/other/http-headers-for-dummies/ for learning purpose.Thank the author----Burak Guzel

What are HTTP Headers?

HTTP stands for "Hypertext Transfer Protocol". The entire World Wide Web uses this protocol. It was established in the early 1990′s. Almost everything you see in your browser is transmitted to your computer over HTTP. For example, when you opened this article page, your browser probably have sent over 40 HTTP requests and received HTTP responses for each.

HTTP headers are the core part of these HTTP requests and responses, and they carry information about the client browser, the requested page, the server and more.

HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

Example

When you type a url in your address bar, your browser sends an HTTP request and it may look like this:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1  
Host: net.tutsplus.com  
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)  
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8  
Accept-Language: en-us,en;q=0.5  
Accept-Encoding: gzip,deflate  
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7  
Keep-Alive: 300  
Connection: keep-alive  
Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120 
Pragma: no-cache  
Cache-Control: no-cache 

First line is the "Request Line" which contains some basic info on the request. And the rest are the HTTP headers.

After that request, your browser receives an HTTP response that may look like this:

 HTTP/1.x 200 OK  
Transfer-Encoding: chunked  
Date: Sat, 28 Nov 2009 04:36:25 GMT  
Server: LiteSpeed  
Connection: close  
X-Powered-By: W3 Total Cache/0.8  
Pragma: public  
Expires: Sat, 28 Nov 2009 05:36:25 GMT  
Etag: "pub1259380237;gz"  
Cache-Control: max-age=3600, public  
Content-Type: text/html; charset=UTF-8  
Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT  
X-Pingback:
http://net.tutsplus.com/xmlrpc.php  
Content-Encoding: gzip  
Vary: Accept-Encoding, Cookie, User-Agent   
 
<HTML xmlns="
http://www.w3.org/1999/xhtml"> 
<HEAD>  
  
<!-- ... rest of the html ... -->  

The first line is the "Status Line", followed by "HTTP headers", until the blank line. After that, the "content" starts (in this case, an HTML output).

When you look at the source code of a web page in your browser, you will only see the HTML portion and not the HTTP headers, even though they actually have been transmitted together as you see above.

These HTTP requests are also sent and received for other things, such as images, CSS files, JavaScript files etc. That is why I said earlier that your browser has sent at least 40 or more HTTP requests as you loaded just this article page.

Now, let’s start reviewing the structure in more detail.

HTTP Request Structure

HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

The first line of the HTTP request is called the request line and consists of 3 parts:
  • The "method" indicates what kind of request this is. Most common methods are GET, POST and HEAD.
  • The "path" is generally the part of the url that comes after the host (domain). For example, when requesting "http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/" , the path portion is "/tutorials/other/top-20-mysql-best-practices/".
  • The "protocol" part contains "HTTP" and the version, which is usually 1.1 in modern browsers.
  •  

    The remainder of the request contains HTTP headers as "Name: Value" pairs on each line. These contain various information about the HTTP request and your browser. For example, the "User-Agent" line provides information on the browser version and the Operating System you are using. "Accept-Encoding" tells the server if your browser can accept compressed output like gzip.

    You may have noticed that the cookie data is also transmitted inside an HTTP header. And if there was a referring url, that would have been in the header too.

    Most of these headers are optional. This HTTP request could have been as small as this:

    GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1  
    Host: net.tutsplus.com

    And you would still get a valid response from the web server.

    Request Methods

    The three most commonly used request methods are: GET, POST and HEAD. You're probably already familiar with the first two, from writing html forms.

    GET: Retrieve a Document

    This is the main method used for retrieving html, images, JavaScript, CSS, etc. Most data that loads in your browser was requested using this method.

    For example, when loading a Nettuts+ article, the very first line of the HTTP request looks like so:

    GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1  
    ... 

    Once the html loads, the browser will start sending GET request for images, that may look like this:

    GET /wp-content/themes/tuts_theme/images/header_bg_tall.png HTTP/1.1  
    ... 

    Web forms can be set to use the method GET. Here is an example.

    <form method="GET" action="foo.php">  
    First Name:<INPUT type=text name=first_name> <br /> 
    Last Name:<INPUT type=text name=last_name> <br />  
    <INPUT type="submit" name="action" value="Submit" />  
    </form>

    When that form is submitted, the HTTP request begins like this:

    GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1  
    ... 

    You can see that each form input was added into the query string.

    POST: Send Data to the Server

    Even though you can send data to the server using GET and the query string, in many cases POST will be preferable. Sending large amounts of data using GET is not practical and has limitations.

    POST requests are most commonly sent by web forms. Let's change the previous form example to a POST method.

    <form method="POST" action="foo.php">  
    First Name: <input type="text" name="first_name" /> <br /> 
    Last Name: <input type="text" name="last_name" /> <br />  
    <input type="submit" name="action" value="Submit" />  
    </form> 

    Submitting that form creates an HTTP request like this:

    POST /foo.php HTTP/1.1  
    Host: localhost  
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)  
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8  
    Accept-Language: en-us,en;q=0.5  
    Accept-Encoding: gzip,deflate  
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7  
    Keep-Alive: 300  
    Connection: keep-alive  
    Referer:
    http://localhost/test.php  
    Content-Type: application/x-www-form-urlencoded  
    Content-Length: 43  

    first_name=John&last_name=Doe&action=Submit

    There are three important things to note here:

  • The path in the first line is simply /foo.php and there is no query string anymore.
  • Content-Type and Content-Lenght headers have been added, which provide information about the data being sent.
  • All the data is in now sent after the headers, with the same format as the query string.
  • HEAD: Retrieve Header Information

    HEAD is identical to GET, except the server does not return the content in the HTTP response. When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself.

    "When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself."

    With this method the browser can check if a document has been modified, for caching purposes. It can also check if the document exists at all.

    For example, if you have a lot of links on your website, you can periodically send HEAD requests to all of them to check for broken links. This will work much faster than using GET.

    HTTP Response Structure

    After the browser sends the HTTP request, the server responds with an HTTP response. Excluding the content, it looks like this:

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    The first piece of data is the protocol. This is again usually HTTP/1.x or HTTP/1.1 on modern servers.

    The next part is the status code followed by a short message. Code 200 means that our GET request was successful and the server will return the contents of the requested document, right after the headers.

    We all have seen "404" pages. This number actually comes from the status code part of the HTTP response. If the GET request would be made for a path that the server cannot find, it would respond with a 404 instead of 200.

    The rest of the response contains headers just like the HTTP request. These values can contain information about the server software, when the page/file was last modified, the mime type etc…

    Again, most of those headers are actually optional.

    HTTP Status Codes

  • 200′s are used for successful requests.
  • 300′s are for redirections.
  • 400′s are used if there was a problem with the request.
  • 500′s are used if there was a problem with the server.
  • 200 OK

    As mentioned before, this status code is sent in response to a successful request.

    206 Partial Content

    If an application requests only a range of the requested file, the 206 code is returned.

    It's most commonly used with download managers that can stop and resume a download, or split the download into pieces.

    404 Not Found

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    When the requested page or file was not found, a 404 response code is sent by the server.

    401 Unauthorized

    Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    Note that this only applies to HTTP password protected pages, that pop up login prompts like this:

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    403 Forbidden

    If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a url for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.

    For example, on my local server I created an images folder. Inside this folder I put an .htaccess file with this line: "Options -Indexes". Now when I try to open http://localhost/images/ – I see this:

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives.

    order allow,deny  
    deny from 192.168.44.201  
    deny from 224.39.163.12  
    deny from 172.16.7.92  
    allow from all 

    302 (or 307) Moved Temporarily & 301 Moved Permanently

    These two codes are used for redirecting a browser. For example, when you use a url shortening service, such as bit.ly, that's exactly how they forward the people who click on their links.

    Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future. But if you redirect using 301, it will tell the spider that your website has moved to that location permanently. To give you a better idea: http://www.nettuts.com redirects to http://net.tutsplus.com/ using a 301 code instead of 302.

    500 Internal Server Error

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    This code is usually seen when a web script crashes. Most CGI scripts do not output errors directly to the browser, unlike PHP. If there is any fatal errors, they will just send a 500 status code. And the programmer then needs to search the server error logs to find the error messages.

    Complete List

    You can find the complete list of HTTP status codes with their explanations here.

    HTTP Headers in HTTP Requests

    Host

    An HTTP Request is sent to a specific IP Addresses. But since most servers are capable of hosting multiple websites under the same IP, they must know which domain name the browser is looking for.

    Host: net.tutsplus.com

    This is basically the host name, including the domain and the subdomain.

    User-Agent

    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)

    This header can carry several pieces of information such as:

  • Browser name and version.
  • Operating System name and version.
  • Default language.
  • This is how websites can collect certain general information about their surfers' systems. For example, they can detect if the surfer is using a cell phone browser and redirect them to a mobile version of their website which works better with low resolutions.

    Accept-Language

    Accept-Language: en-us,en;q=0.5

    This header displays the default language setting of the user. If a website has different language versions, it can redirect a new surfer based on this data.

    It can carry multiple languages, separated by commas. The first one is the preferred language, and each other listed language can carry a "q" value, which is an estimate of the user's preference for the language (min. 0 max. 1).

    Accept-Encoding

    Accept-Encoding: gzip,deflate

    Most modern browsers support gzip, and will send this in the header. The web server then can send the HTML output in a compressed format. This can reduce the size by up to 80% to save bandwidth and time.

    If-Modified-Since

    If a web document is already cached in your browser, and you visit it again, your browser can check if the document has been updated by sending this:

    If-Modified-Since: Sat, 28 Nov 2009 06:38:19 GMT

    If it was not modified since that date, the server will send a "304 Not Modified" response code, and no content – and the browser will load the content from the cache.

    Cookie

    As the name suggests, this sends the cookies stored in your browser for that domain.

    Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120; foo=bar

    These are name=value pairs separated by semicolons. Cookies can also contain the session id.

    Referer

    As the name suggests, this HTTP header contains the referring url.

    For example, if I visit the Nettuts+ homepage, and click on an article link, this header is sent to my browser:

    Referer: http://net.tutsplus.com/

    Authorization

    When a web page asks for authorization, the browser opens a login window. When you enter a username and password in this window, the browser sends another HTTP request, but this time it contains this header.

    Authorization: Basic bXl1c2VyOm15cGFzcw==

    The data inside the header is base64 encoded. For example, base64_decode('bXl1c2VyOm15cGFzcw==') would return 'myuser:mypass'

    HTTP Headers in HTTP Responses

    Cache-Control

    Definition from w3.org: "The Cache-Control general-header field is used to specify directives which MUST be obeyed by all caching mechanisms along the request/response chain." These "caching mechanisms" include gateways and proxies that your ISP may be using.

    Example:

    Cache-Control: max-age=3600, public

    "public" means that the response may be cached by anyone. "max-age" indicates how many seconds the cache is valid for. Allowing your website to be cached can reduce server load and bandwidth, and also improve load times at the browser.

    Caching can also be prevented by using the “no-cache” directive.

    Cache-Control: no-cache

    For more detailed info, see w3.org.

    Content-Type

    This header indicates the "mime-type" of the document. The browser then decides how to interpret the contents based on this. For example, an html page (or a PHP script with html output) may return this:

    Content-Type: text/html; charset=UTF-8

    "text" is the type and "html" is the subtype of the document. The header can also contain more info such as charset.

    For a gif image, this may be sent.

    Content-Type: image/gif

    The browser can decide to use an external application or browser extension based on the mime-type. For example this will cause the Adobe Reader to be loaded:

    Content-Type: application/pdf

    When loading directly, Apache can usually detect the mime-type of a document and send the appropriate header. Also most browsers have some amount fault tolerance and auto-detection of the mime-types, in case the headers are wrong or not present.

    You can find a list of common mime types here.

    Content-Disposition

    This header instructs the browser to open a file download box, instead of trying to parse the content. Example:

    Content-Disposition: attachment; filename="download.zip"

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    Note that the appropriate Content-Type header should also be sent along with this:

    Content-Type: application/zip
    Content-Disposition: attachment; filename="download.zip"

    Content-Length

    When content is going to be transmitted to the browser, the server can indicate the size of it (in bytes) using this header.

    Content-Length: 89123

    This is especially useful for file downloads. That's how the browser can determine the progress of the download.

    Etag

    This is another header that is used for caching purposes. It looks like this:

    Etag: "pub1259380237;gz"

    If the Etag value of the document matches that, the server will send a 304 code instead of 200, and no content. The browser will load the contents from its cache.

    Last-Modified

    As the name suggests, this header indicates the last modify date of the document, in GMT format:

    Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT

    It offers another way for the browser to cache a document. The browser may send this in the HTTP request:

    If-Modified-Since: Sat, 28 Nov 2009 06:38:19 GMT

    We already talked about this earlier in the "If-Modified-Since" section.

    Location

    This header is used for redirections. If the response code is 301 or 302, the server must also send this header. For example, when you go to http://www.nettuts.com your browser will receive this:

    HTTP/1.x 301 Moved Permanently
    ...
    Location:
    http://net.tutsplus.com/
    ...

    Set-Cookie

    When a website wants to set or update a cookie in your browser, it will use this header.

    Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Sun, 29-Nov-2009 21:42:28 GMT
    Set-Cookie: session-id=120-7333518-8165026; path=/; domain=.amazon.com; expires=Sat Feb 27 08:00:00 2010 GMT

    Each cookie is sent as a separate header. Note that the cookies set via JavaScript do not go through HTTP headers.

    If the expiration date is not specified, the cookie is deleted when the browser window is closed.

    WWW-Authenticate

    A website may send this header to authenticate a user through HTTP. When the browser sees this header, it will open up a login dialogue window.

    WWW-Authenticate: Basic realm="Restricted Area"

    Which looks like this:

    HTTP Headers【转】 - chhaj5236 - chhaj5236的博客

    Content-Encoding

    This header is usually set when the returned content is compressed.

    Content-Encoding: gzip

      评论这张
     
    阅读(1759)| 评论(0)
    推荐 转载

    历史上的今天

    评论

    <#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    页脚

    网易公司版权所有 ©1997-2017