molly.com

Sunday 4 December 2005

Anatomy of a Document

Here’s the deconstructed, annotated XHTML document for your education and enjoyment.

<?xml version="1.0" encoding="utf-8"?>

This is the XML declaration. Its role is to declare an XML document, its version, and can also be used to describe the encoding of the document. Its use is recommended but not required by the W3C, largely due to backward compatibility concerns, as many user agents do not properly interpret the XML declaration. This can result in a number of issues: The document markup being displayed in the viewport instead of the document itself; the document rendered as an XML tree; or in the case of IE 6.0, the document remaining in Quirks Mode as anything other than white space placed above a DOCTYPE declaration will prevent IE 6.0 from switching to Compliance Mode.

Note: Because of the browser compatibility issues with the XML declaration, most designers and developers leave it off of documents to avoid problems.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

This is the DOCTYPE declaration. DOCTYPE declarations are bits of SGML that describe the document type, including where the Document Type Definition (DTD) for the document is published and by whom. DOCTYPE declarations are required in HTML 4.x and XHTML 1.x in order for the document to be compliant. With the exception of DOCTYPE Switching
DOCTYPE declarations are passive until the document is tested for conformance with validation, at which point the validator uses the information within the DOCTYPE declaration to compare the document to the DTD with which it claims to be associated.

Note: The XML declaration plus the DOCTYPE declaration make up what is known as the XML prolog.

<html xmlns="http://www.w3.org/1999/xhtml">

The opening tag of the root element html and the XML namespace attribute. In HTML and XHTML documents, the root element is html. This means that html is the grand ancestor: It has no ancestors, only children and descendants. The namespace attribute further describes which markup language or languages are being used in the document. Despite the fact that many validators will validate an XHTML document that has no namespace attribute, it should always be present.

Interestingly, many designers and developers are unaware that the html element can be styled, and is especially useful when creating a site signature or other means of page identification as well as implementing inherited styles from the top of the tree on down.

<head>

The opening tag for the head portion of the document. The head contains meta information about the document, style, and script information. The head is a required part of the document in XHTML 1.x.

<title>Sample XHTML Document</title>

The title element including the opening and closing title tags and the title text, which displays in the title bar of the browser. Properly written title text is important as it aids in usability, accessibility and search engine optimization.

Note: The title element is required in HTML 4.x and XHTML 1.x documents

<link rel="stylesheet" type="text/css" src="css/global.css" media="all" />

The link element is used here to link a stye sheet, define the link relationship using the rel attribute, define the MIME type as text/css, point to the source of the .css document, and define the media for this style sheet as being a value of all, for all media that supports CSS.

The link element is what’s referred to as an empty element or a replaced element. Empty elements do not contain text content, and typically they are replaced by something, such as an image for img, a line break for br or a horizontal rule for hr. Empty elements are sometimes referred to as singletons because empty elements are represented as one tag, <br>. In XHTML, the empty element must be terminated with a trailing slash: <br />.

Note: The space between the element name and the trailing slash is a convention, not a requirement. It’s used for backward compatibility as some browsers, including Netscape 4.x browsers, have rendering issues with the slash when it immediately follows a non-space character. With the space, older browsers properly render the markup.

</head>

The closing tag for the head element.

Note: The head element is made up of its opening tag, the element’s children and descendent elements, and the closing tag for the head element.

<body>

The opening tag for the body element. All subsequent elements result in the content display within the viewport.

<h1>Header Level One</h1>

Structural and semantic heading element. The h1 element is the most important header on the page, is often used to identify the site by name, and is important to search engines.

<p>Paragraph of text with a relative link and an absolute link.</p>

Headers and paragraphs are considered non-empty elements because they contain text content. In XHTML, non-empty elements must terminate with a closing tag. HTML allows certain non-empty elements to be represented only by their opening tag, such as p and li. Headers and paragraphs are also good examples of block elements, and the anchor element is considered to be an inline element.

Note: Display properties can be modified using CSS. This is why we are able to use lists for horizontal navigation, for example.

</body>

Closing tag for the body element.

Note: Readers might find it of interest to know that in HTML, the body, head and html elements weren’t required! Don’t believe me? Here’s an HTML 3.2 document free of those elements - validate it for yourself. Fun!

</html>

Closing tag for the root element of html


Now, if you would like to help me gather up the cited terms and begin working on this glossary, I’ll be happy to take your input in the comments.

Filed under:   general
Posted by:   Molly | 2:13 am |

50 Responses to “Anatomy of a Document”

  1. molly.com » XHTML Anatomy: A Document Deconstructed Says:

    […] ction, including an example of a completely valid headless, body-less, HTML 3.2 document. Continue reading . . . Filed under:   professional, standards, web design Posted by:   Mol […]

  2. Rimantas Says:

    Seems like this glossary may confuse some
    with ‘elements’ vs. ‘tags’

    Tags for HTML, HEAD and BODY are not required in HTML4.01 as well, they are present in DOM even if they are not in markup. The only element with required tags is - and this is true even for HTML4.01 strict.

    Proof is in DTD and here: http://rimantas.com/bits/minimal_html.html

  3. Rimantas Says:

    oh, the only required element which must be marked up with tags is TITLE, html vas filtered in previous comment.

  4. Russ Weakley Says:

    A very interesting idea! Two quick questions:

    1. Describing meta elements would probably open a can of worms, but what about particular ones like character encoding?

    <meta http-equiv=”content-type” content=”text/html; charset=utf-8″ />

    2. There are many people who could benefit from a brief description of what elements and attributes are. Is it worth placing some basic definitions of these before starting the deconstruction of a document itself?

    Anyway… a good start!

  5. Molly Says:

    Rimantas: I’ve fixed it in the text, thank you!

    Russ: That would all go into the glossary. I know it’s backwards, but that’s what happens when I’m left to myself :)

  6. Adam Says:

    What about the other tags that are optional but quite often used within documents?

    Bold, Italics, Underline, Span and Javascript immediately come to mind. I think there should be a section of your document that discusses the major ones as well.

  7. CM Harrington Says:

    I think this is a really interesting project. It is so easy to let a small idea balloon into something larger. When that happens, I usually go back to the original idea and construct that, with the intention that I will add the additional functionality. This allows me to keep “shipping” product, generating interest, and gaining more real-world feedback.

    This iterative approach allows your audience to grow along with your product, so when you add new capability, the learning curve is flattened. Actually, this has given me a good topic on which to write. I’m currently designing a very small web-based application that is going through this exact process. Thanks!

  8. CM Harrington Says:

    Oh, suggestion:

    I’d like to see (via “title” attribute, or similar.), the definitions of ACRONYMS and ABBRs. That will make the definition commentary more valuable. (for example, SGML should tell me that it stands for “Standard Generalised Markup Language” and a blurb on what that means in english)

  9. Christian Montoya Says:

    Wow, it’s about time, thanks Molly. Maybe this will stop people from saying things like “alt tag” … they know who they are.

    The only thing I would change with this example is:

    *h1*Header Level One*/h1*

    Structural and semantic heading element. The h1 element is the most important header on the page, is often used to identify the site by name, and is important to search engines.

    I would change to:

    … is often used to identify the site by name or identify the title of the current page…

    since many sites put the title of the page in the h1 and leave the site title out of headers for semantic and SEO reasons.

  10. Dronpus Says:

    How about, like, for those people who actually use XHTML? :P

  11. Dronpus Says:

    … <?xml-stylesheet … ?> that is.

  12. Dustin Diaz Says:

    Just don’t use the word “alt tag” when you get to it. You’re really going to piss off Roger (456Berea) and me.

  13. kerri Says:

    From what I understand, an XHTML document should be served up with a content-type of application/xml+xhtml, no?

  14. Molly Says:

    Kerri:

    XHTML 1.0 can be served as both text/html and application/xml+xhtml. This is explained as being due to backward compatibility for XHTML 1.0. Many people believe that using XHTML and not serving it as application/xml+xhtml is useless, but I disagree. There are other advantages to learning or using XHTML 1.0 in my opinion.

    However, XHTML 1.1 should be served as application/xml+xhtml or not used.

  15. Rimantas Says:

    One more nitpicking - it’s application/xhtml+xml, not other way round.

  16. Lachlan Hunt Says:

    > Readers might find it of interest to know that in HTML, the body, head and html elements weren’t required!

    For someone so interested in nomenclature and calling things by their correct name, I’m surprised that you could make such a mistake. The elements themselves are required, they are present in all documents. Their start- and end-tags, however, aren’t required, which is what you’re referring to. The same is true of tbody, but all other elements require at least a start-tag, some of which still allow optional end-tags.

    Russ Weakley, the meta element for setting the character encoding or the MIME type, for that matter, is completely useless and must not be used in XHTML. It works for HTML, but it is an inferior substitue for real HTTP headers.

    Of course, it does appear to have an effect in XHTML when the incorrectly served as text/html, but under XML conditions it is completely meaningless.

    One more very important point about the XML declaration is that if it is omitted from the document, then the file must be encoded in either UTF-8 or UTF-16 or the encoding must be specified in a higher-level protocol, like HTTP.

  17. Chip D Says:

    I’m trying to change the text color of my blinking, scrolling banner in frame 6 of my dog’s web page using Mikersoft Word - your little guide doesn’t help at all. :)

  18. » Blog Archive - » Anatomy of a Document Alex Jones - No, not that Alex Jones… Really, I’m not the Alex Jones you think I am. Says:

    […]
    « Silkscreening at Home

    Anatomy of a Document

    Anatomy of a Document - “deconstructed, annotated XHTML document for your education and enjoyme […]

  19. Molly Says:

    Well, Lachlan, the entire point of this exercise is exactly that - to work on this with our collaborative knowledge and therefore have a stronger result.

    It just proves a person can know a lot and know nothing. I humbly submit that would describe me perfectly :)

  20. Joerg Petermann Says:

    Nice to read. Thank you!

  21. Keri Henare Says:

    Personally I think that ‘application/xhtml+xml’ mime-type and the XML declaration go hand in hand. It should be optional to use them both for XHTML1.0 but compulsary for XHTML1.1+. However you can’t really to that until all browsers accept XML properly and IE probably won’t until IE8

  22. Take My Advice - I’m Not Using It! » Using XHTML and CSS Says:

    […] ndards - The title really says it all. Another wonderful article from 456 Berea Street. Anatomy of a Document - a deconstructed, annotated XHTML document. Rachel Andrew has an article at 2 […]

  23. Dave McFarland Says:

    Just thought you’d like to know: the tag isn’t correct in the example above. There isn’t a src attribute; it should be href=”css/global.css”

  24. Anatomie eines XHTML-Dokumentes » Peruns Weblog - Webwork und Internet Says:

    […] s Abgelegt unter: HTML-CSS — Perun um 8:11 Wie ein XHTML-Dokument aufgebaut ist beschreibt Molly Holzschlag in ihrem Weblog. Stichwörter: keine « Neue Version […]

  25. Robert Wellock Says:

    An XHTML document can be standalone, and I wouldn’t classify the LINK as a replaced element either: IMG and OBJECT can be considered ‘replaced elements’ though.

  26. About Web Designing » Blog Archive » The anatomy of an XHTML document Says:

    […] typography, in this following post I would like to point you to a link that elucidates the anatomy of an XHTML document. Molly is creating a human-readable glossary of XHTML and CSS terminolog […]

  27. Peter Jacobson Says:

    Shouldn’t the lang attribute be specified in the opening html tag? At least for accessibility reasons?

  28. Simple Inside » Anatomy of an XHTML Document Says:

    […] Dec 10 2005 | Tagged as: Quick Posts
    Molly has written s good article explaining the anatomy of an XHTML Document. Good to Learn.

    […]

  29. Jay Fienberg Says:

    What distinction are you making between structural and semantic when you write (about the H1) that it is a “structural and semantic heading element”?

  30. edu.volpe.ch  •  XHTML-Dokumentaufbau Says:

    […] au eines XHTML-Dokuments widmet sich auch Molly Holzschlag in ihrem aktuellen Blogeintrag: […]

  31. Aden Albert Says:

    Jay Fienberg–coming mostly from an English theory background with a dollop of web design experience, I’d say the distinction between structural and semantic is a distinction of viewership. The structural markup denotes meaning to the browser, while the semantic markup denotes meaning in the overall hierarchy of the document as seen by the user.

  32. Jules Says:

    I disagree with Molly’s description of the application of the H1 tag.

    WordPress blogs, because of the nature of the default templates, and possibly other blog software as well, markup the name of the blog with the H1 tag but I don’t believe that this is correct use of this tag. If you understand that the H1 tag is the top-most text-structure tag, then every article in your blog has the same heading, the name of your blog.

    If you are interested, you can read more of my opinions about the H1 tag in my own article entitled The Heading 1 Challenge.

  33. GRUPO ALIANZA EMPRESARIAL Says:

    Good…

  34. Cheyne Says:

    Great article. I blogged it on www.thewebdesignblog.com

  35. The Web Design Blog Says:

    […] Molly.com explains the anatomy of a web document including what each tag is used for, what it does, and whether it’s essential or not to implement. « Notepad2 - Just as fast, but with much, much more   […]

  36. Marjolein Katsma Says:

    Molly, you write:
    “The opening tag of the root element html and the XML namespace attribute. In HTML and XHTML documents, the root element is html. This means that html is the grand ancestor: It has no ancestors, only children and descendants. The namespace attribute further describes which markup language or languages are being used in the document. Despite the fact that many validators will validate an XHTML document that has no namespace attribute, it should always be present.”

    Unfortunately, what you say about the xmlns attribute is a widely held misconception. The xmlns attribute for the html tag in XHTML is *not* defined as REQUIRED (which would mean you have to specify it), and *not* defined as optional, by giving a DEFAULT value (which would mean it gets the default value if you don’t specify it but otherwise you may override it with a legal value), but instead as having a FIXED value. This means it *has* this value, as defined in the DTD, anyway whether you specify it or not; this also means that while it is *allowed* to specify it with the value it already has, it is *not* allowed to give it any other value - as would be suggested by specifying the attribute in the first place.

    Hence it is not only entirely correct that documents not specifying it validate, it is actually *better* not to specify it in the first place to avoid the suggestion it could have any other value. It certainly should not “always be present”, as you claim.

  37. XHTML - anatomia dokumentu » Barczentewicz.com Says:

    […] Tekst ten przedstawia strukturę dokumentu XHTML, wraz z opisem poszczególnych elementów. Jest to tłumaczenie ze strony molly.com. […]

  38. DesignStage.Net Says:

    nice!!…

  39. Ravings of an Intermittent Fool Says:

    Anatomy of a XHTML Document

    Anatomy of a XHTML Document is a good, step by step introduction to what makes up a fully valid XHTML document. Right now the example is pretty basic and probably won’t help people who have spent a bit of time with XHTML but it would be a good link to…

  40. nur Bahnhof » Anatomie eines xHTML Dokuments Says:

    […] http://www.molly.com/anatomy-of-a-document/ […]

  41. سردال » من هنا وهناك … الحلقة الأخيرة Says:

    […] تشريح صفحة XHTML. […]

  42. Martin Hamann Says:

    *h1*Header Level One*/h1*

    Consider:

    Only semantic heading element exists. The h1 element is important “header” on every website, is often used to identify the HP by name, and is very important to Google.
    greetings from Kiel.

  43. Kram Turnover Says:

    Cutting the swath through the tutorial with constant checking against w3c.org, then verifying/changes/re-changes … i think i know have a proper XHTML1.0 Transitional plan, but at the tail end of it, would have wished more of the blogged changes were re-integrated into the body, specifically the errors. Stimulating though…

  44. irfan Says:

    hi dear how are u do u like tellme me alt mean and full defention plz ………….. i wait u

    thanks

  45. Cleaveknini Says:

    Hi
    Need sex with local girls click here aarens dating directory online services
    http://aarensdatingdirectoryonlineservic.blogspot.com
    G’night

  46. Yello Says:

    Thank you, i think its very useful for me. h1 and Title are one of the important things as i see.

  47. http://mama.indstate.edu/users/vikram/00016.html Says:

    Great boys95a4ec17fae4ff2cab5501ec4199aad7

  48. Chanel Says:

    I just came into the webpage today, and it made me really happy to see that. My site chanel handbags http://chanelhandbagss.forum5.com/index.php
    chanel handbags
    [url=http://chanelhandbagss.forum5.com/index.php]chanel handbags[/url]

  49. Blue Says:

    Hi!
    A friend of mine told me about that site and i am really impressed. This site was very useful to me and i just wanted to thank you for so much useful information. Greetings
    Blue

  50. gifts Says:

    Yeah, but I’m sure that the BODY and HEAD elements are greatly required at the moment in order for all browsers to read the HTML appropriately. Perhaps it wasn’t in the old days, when there was only HTML, but these days there’s a heck of a lot more for those browsers to read and I’m sure we would absolutely have to have these tags. They probably also help in ensuring the CSS is read correctly.

Leave a Reply

Elsewhere

Roll Roll Roll