HTML is a relatively simple language, but that doesn't stop people from having problems with it. Why is that? It's mainly because, while the HTML tags themselves are easy, creating an HTML document that works as intended on a web server requires you to know a few extra things that aren't often explained. Here, then, is a guide to understanding those parts of HTML that they just don't tell you about in the books.
Step 1: Understand Doctypes.
It isn't often noted that valid HMTL documents don't actually start with the tag – they have one extra tag before it. This is the doctype, and it must be present right at the top of your document for it to be valid HTML.
There are only really two doctypes that you really need to know about. The HTML4 doctype looks like this:
The XHTML one looks like this:
These versions of the doctypes that are a little more forgiving – if you're a purist, you can use the strict ones instead by changing the words 'transitional' and 'loose' to 'strict'.
But what is the doctype for? Well, its purpose is simple enough: it tells web browsers exactly what version of HTML your page was written in, to help them to interpret it correctly.
Step 2: Understand HTTP Errors.
A truly shocking number of people writing HTML pages don't know how HTTP works – and they quickly run into trouble because of it. HTTP is the way a web browser communicates with a web server, and this communication includes information about your pages, such as cookies.
You don't need to worry too much about the internals of HTTP, but it's worth knowing that it works by the browser sending a request to the server for a certain page, and the server then responding with a code.
Your website should be set up to handle error codes well. For example, a 404 (page not found) error should show a page with links to the most useful parts of your site. Other common error codes include:
200 - OK
301 - Page moved.
403 - Forbidden (no authorisation to access).
500 - Internal server error.
For more information, visit www.w3.org/protocols.
Step 3: Understand MIME Types.
MIME types are another part of the HTML header – an important one. Also known as the content-type header, they tell the browser what kind of file they are about to send. Browsers don't rely on HTML files ending in .html, JPEG images ending in .jpeg, and so on: they rely on the content-type header. If you don't know about this, you can have problems if you need to configure your server to send anything unusual.
Here are some common MIME types:
text/html - HTML.
text/css - CSS
text/plain - plain text.
image/gif - GIF image.
image/jpeg - JPEG image.
image/png - PNG image.
audio/mpeg - MP3 audio file.
application/x-shockwave-flash - Flash movie.
Step 4: Understand Link Paths.
One of the hardest things to understand about HTML is all the different things that you can put in an 'href' property. Abbreviated URLs are created using the rules of old text-based operating systems, and there are plenty of people writing HTML today who are completely unfamiliar with these rules.
Here are some examples. For each one, the assumption is that the link is on a page at http://www.example.com/example1/example1.html.