molly.com
Sunday 4 December 2005
Anatomy of a Document
Here’s the deconstructed, annotated XHTML document for your education and enjoyment.
- <?xml version="1.0" encoding="utf-8"?>
-
This is the XML declaration. Its role is to declare an XML document, its version, and can also be used to describe the encoding of the document. Its use is recommended but not required by the W3C, largely due to backward compatibility concerns, as many user agents do not properly interpret the XML declaration. This can result in a number of issues: The document markup being displayed in the viewport instead of the document itself; the document rendered as an XML tree; or in the case of IE 6.0, the document remaining in Quirks Mode as anything other than white space placed above a
DOCTYPEdeclaration will prevent IE 6.0 from switching to Compliance Mode.Note: Because of the browser compatibility issues with the XML declaration, most designers and developers leave it off of documents to avoid problems.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">-
This is the
DOCTYPEdeclaration.DOCTYPEdeclarations are bits of SGML that describe the document type, including where the Document Type Definition (DTD) for the document is published and by whom.DOCTYPEdeclarations are required in HTML 4.x and XHTML 1.x in order for the document to be compliant. With the exception of DOCTYPE Switching
DOCTYPEdeclarations are passive until the document is tested for conformance with validation, at which point the validator uses the information within theDOCTYPEdeclaration to compare the document to the DTD with which it claims to be associated.Note: The XML declaration plus the
DOCTYPEdeclaration make up what is known as the XML prolog. - <html xmlns="http://www.w3.org/1999/xhtml">
-
The opening tag of the root element
htmland the XML namespace attribute. In HTML and XHTML documents, the root element ishtml. This means thathtmlis the grand ancestor: It has no ancestors, only children and descendants. The namespace attribute further describes which markup language or languages are being used in the document. Despite the fact that many validators will validate an XHTML document that has no namespace attribute, it should always be present.Interestingly, many designers and developers are unaware that the
htmlelement can be styled, and is especially useful when creating a site signature or other means of page identification as well as implementing inherited styles from the top of the tree on down. - <head>
-
The opening tag for the
headportion of the document. Theheadcontains meta information about the document, style, and script information. Theheadis a required part of the document in XHTML 1.x. - <title>Sample XHTML Document</title>
-
The
titleelement including the opening and closingtitletags and the title text, which displays in the title bar of the browser. Properly written title text is important as it aids in usability, accessibility and search engine optimization.Note: The
titleelement is required in HTML 4.x and XHTML 1.x documents - <link rel="stylesheet" type="text/css" src="css/global.css" media="all" />
-
The
linkelement is used here to link a stye sheet, define the link relationship using therelattribute, define the MIME type astext/css, point to the source of the .css document, and define the media for this style sheet as being a value ofall, for all media that supports CSS.The
linkelement is what’s referred to as an empty element or a replaced element. Empty elements do not contain text content, and typically they are replaced by something, such as an image forimg, a line break forbror a horizontal rule forhr. Empty elements are sometimes referred to as singletons because empty elements are represented as one tag,<br>. In XHTML, the empty element must be terminated with a trailing slash:<br />.Note: The space between the element name and the trailing slash is a convention, not a requirement. It’s used for backward compatibility as some browsers, including Netscape 4.x browsers, have rendering issues with the slash when it immediately follows a non-space character. With the space, older browsers properly render the markup.
- </head>
-
The closing tag for the
headelement.Note: The
headelement is made up of its opening tag, the element’s children and descendent elements, and the closing tag for theheadelement. - <body>
-
The opening tag for the body element. All subsequent elements result in the content display within the viewport.
- <h1>Header Level One</h1>
-
Structural and semantic heading element. The
h1element is the most important header on the page, is often used to identify the site by name, and is important to search engines. - <p>Paragraph of text with a relative link and an absolute link.</p>
-
Headers and paragraphs are considered non-empty elements because they contain text content. In XHTML, non-empty elements must terminate with a closing tag. HTML allows certain non-empty elements to be represented only by their opening tag, such as
pandli. Headers and paragraphs are also good examples of block elements, and the anchor element is considered to be an inline element.Note: Display properties can be modified using CSS. This is why we are able to use lists for horizontal navigation, for example.
- </body>
-
Closing tag for the
bodyelement.Note: Readers might find it of interest to know that in HTML, the
body,headandhtmlelements weren’t required! Don’t believe me? Here’s an HTML 3.2 document free of those elements – validate it for yourself. Fun! - </html>
-
Closing tag for the root element of
html
Now, if you would like to help me gather up the cited terms and begin working on this glossary, I’ll be happy to take your input in the comments.
Filed under: general
Posted by: Molly | 02:13 | Comments (47)

[...] ction, including an example of a completely valid headless, body-less, HTML 3.2 document. Continue reading . . . Filed under: professional, standards, web design Posted by: Mol [...]
Seems like this glossary may confuse some
with ‘elements’ vs. ‘tags’
Tags for HTML, HEAD and BODY are not required in HTML4.01 as well, they are present in DOM even if they are not in markup. The only element with required tags is – and this is true even for HTML4.01 strict.
Proof is in DTD and here: http://rimantas.com/bits/minimal_html.html
oh, the only required element which must be marked up with tags is TITLE, html vas filtered in previous comment.
A very interesting idea! Two quick questions:
1. Describing meta elements would probably open a can of worms, but what about particular ones like character encoding?
<meta http-equiv=”content-type” content=”text/html; charset=utf-8″ />
2. There are many people who could benefit from a brief description of what elements and attributes are. Is it worth placing some basic definitions of these before starting the deconstruction of a document itself?
Anyway… a good start!
Rimantas: I’ve fixed it in the text, thank you!
Russ: That would all go into the glossary. I know it’s backwards, but that’s what happens when I’m left to myself
What about the other tags that are optional but quite often used within documents?
Bold, Italics, Underline, Span and Javascript immediately come to mind. I think there should be a section of your document that discusses the major ones as well.
I think this is a really interesting project. It is so easy to let a small idea balloon into something larger. When that happens, I usually go back to the original idea and construct that, with the intention that I will add the additional functionality. This allows me to keep “shipping” product, generating interest, and gaining more real-world feedback.
This iterative approach allows your audience to grow along with your product, so when you add new capability, the learning curve is flattened. Actually, this has given me a good topic on which to write. I’m currently designing a very small web-based application that is going through this exact process. Thanks!
Oh, suggestion:
I’d like to see (via “title” attribute, or similar.), the definitions of ACRONYMS and ABBRs. That will make the definition commentary more valuable. (for example, SGML should tell me that it stands for “Standard Generalised Markup Language” and a blurb on what that means in english)
Wow, it’s about time, thanks Molly. Maybe this will stop people from saying things like “alt tag” … they know who they are.
The only thing I would change with this example is:
*h1*Header Level One*/h1*
Structural and semantic heading element. The h1 element is the most important header on the page, is often used to identify the site by name, and is important to search engines.
I would change to:
… is often used to identify the site by name or identify the title of the current page…
since many sites put the title of the page in the h1 and leave the site title out of headers for semantic and SEO reasons.
How about, like, for those people who actually use XHTML?
… <?xml-stylesheet … ?> that is.
Just don’t use the word “alt tag” when you get to it. You’re really going to piss off Roger (456Berea) and me.
From what I understand, an XHTML document should be served up with a content-type of application/xml+xhtml, no?
Kerri:
XHTML 1.0 can be served as both text/html and application/xml+xhtml. This is explained as being due to backward compatibility for XHTML 1.0. Many people believe that using XHTML and not serving it as application/xml+xhtml is useless, but I disagree. There are other advantages to learning or using XHTML 1.0 in my opinion.
However, XHTML 1.1 should be served as application/xml+xhtml or not used.
One more nitpicking – it’s application/xhtml+xml, not other way round.
> Readers might find it of interest to know that in HTML, the body, head and html elements weren’t required!
For someone so interested in nomenclature and calling things by their correct name, I’m surprised that you could make such a mistake. The elements themselves are required, they are present in all documents. Their start- and end-tags, however, aren’t required, which is what you’re referring to. The same is true of tbody, but all other elements require at least a start-tag, some of which still allow optional end-tags.
Russ Weakley, the meta element for setting the character encoding or the MIME type, for that matter, is completely useless and must not be used in XHTML. It works for HTML, but it is an inferior substitue for real HTTP headers.
Of course, it does appear to have an effect in XHTML when the incorrectly served as text/html, but under XML conditions it is completely meaningless.
One more very important point about the XML declaration is that if it is omitted from the document, then the file must be encoded in either UTF-8 or UTF-16 or the encoding must be specified in a higher-level protocol, like HTTP.
I’m trying to change the text color of my blinking, scrolling banner in frame 6 of my dog’s web page using Mikersoft Word – your little guide doesn’t help at all.
[...]
« Silkscreening at Home
Anatomy of a Document
Anatomy of a Document – “deconstructed, annotated XHTML document for your education and enjoyme [...]
Well, Lachlan, the entire point of this exercise is exactly that – to work on this with our collaborative knowledge and therefore have a stronger result.
It just proves a person can know a lot and know nothing. I humbly submit that would describe me perfectly
Nice to read. Thank you!
Personally I think that ‘application/xhtml+xml’ mime-type and the XML declaration go hand in hand. It should be optional to use them both for XHTML1.0 but compulsary for XHTML1.1+. However you can’t really to that until all browsers accept XML properly and IE probably won’t until IE8
[...] ndards – The title really says it all. Another wonderful article from 456 Berea Street. Anatomy of a Document – a deconstructed, annotated XHTML document. Rachel Andrew has an article at 2 [...]
Just thought you’d like to know: the tag isn’t correct in the example above. There isn’t a src attribute; it should be href=”css/global.css”
[...] s Abgelegt unter: HTML-CSS — Perun um 8:11 Wie ein XHTML-Dokument aufgebaut ist beschreibt Molly Holzschlag in ihrem Weblog. Stichwörter: keine « Neue Version [...]
An XHTML document can be standalone, and I wouldn’t classify the LINK as a replaced element either: IMG and OBJECT can be considered ‘replaced elements’ though.
[...] typography, in this following post I would like to point you to a link that elucidates the anatomy of an XHTML document. Molly is creating a human-readable glossary of XHTML and CSS terminolog [...]
Shouldn’t the lang attribute be specified in the opening html tag? At least for accessibility reasons?
[...] Dec 10 2005 | Tagged as: Quick Posts
Molly has written s good article explaining the anatomy of an XHTML Document. Good to Learn.
[...]
What distinction are you making between structural and semantic when you write (about the H1) that it is a “structural and semantic heading element”?
[...] au eines XHTML-Dokuments widmet sich auch Molly Holzschlag in ihrem aktuellen Blogeintrag: [...]
Jay Fienberg–coming mostly from an English theory background with a dollop of web design experience, I’d say the distinction between structural and semantic is a distinction of viewership. The structural markup denotes meaning to the browser, while the semantic markup denotes meaning in the overall hierarchy of the document as seen by the user.
I disagree with Molly’s description of the application of the H1 tag.
WordPress blogs, because of the nature of the default templates, and possibly other blog software as well, markup the name of the blog with the H1 tag but I don’t believe that this is correct use of this tag. If you understand that the H1 tag is the top-most text-structure tag, then every article in your blog has the same heading, the name of your blog.
If you are interested, you can read more of my opinions about the H1 tag in my own article entitled The Heading 1 Challenge.
Good…
Molly, you write:
“The opening tag of the root element html and the XML namespace attribute. In HTML and XHTML documents, the root element is html. This means that html is the grand ancestor: It has no ancestors, only children and descendants. The namespace attribute further describes which markup language or languages are being used in the document. Despite the fact that many validators will validate an XHTML document that has no namespace attribute, it should always be present.”
Unfortunately, what you say about the xmlns attribute is a widely held misconception. The xmlns attribute for the html tag in XHTML is *not* defined as REQUIRED (which would mean you have to specify it), and *not* defined as optional, by giving a DEFAULT value (which would mean it gets the default value if you don’t specify it but otherwise you may override it with a legal value), but instead as having a FIXED value. This means it *has* this value, as defined in the DTD, anyway whether you specify it or not; this also means that while it is *allowed* to specify it with the value it already has, it is *not* allowed to give it any other value – as would be suggested by specifying the attribute in the first place.
Hence it is not only entirely correct that documents not specifying it validate, it is actually *better* not to specify it in the first place to avoid the suggestion it could have any other value. It certainly should not “always be present”, as you claim.
[...] Tekst ten przedstawia strukturę dokumentu XHTML, wraz z opisem poszczególnych elementów. Jest to tłumaczenie ze strony molly.com. [...]
[...] http://www.molly.com/anatomy-of-a-document/ [...]
[...] تشريح صفحة XHTML. [...]
Cutting the swath through the tutorial with constant checking against w3c.org, then verifying/changes/re-changes … i think i know have a proper XHTML1.0 Transitional plan, but at the tail end of it, would have wished more of the blogged changes were re-integrated into the body, specifically the errors. Stimulating though…
hi dear how are u do u like tellme me alt mean and full defention plz ………….. i wait u
thanks
Thank you, i think its very useful for me. h1 and Title are one of the important things as i see.
Hi!
A friend of mine told me about that site and i am really impressed. This site was very useful to me and i just wanted to thank you for so much useful information. Greetings
Blue
Yeah, but I’m sure that the BODY and HEAD elements are greatly required at the moment in order for all browsers to read the HTML appropriately. Perhaps it wasn’t in the old days, when there was only HTML, but these days there’s a heck of a lot more for those browsers to read and I’m sure we would absolutely have to have these tags. They probably also help in ensuring the CSS is read correctly.
Can this be made in partnership with Flex? This may solve the browser issues.
“Can this be made in partnership with Flex?” –> I dont think so. Or I just dont see that possibility. But who knows? I would try google chrome
Ian: Google “Masturbatory”
“Yeah, but I’m sure that the BODY and HEAD elements are greatly required at the moment in order for all browsers to read the HTML appropriately”
–> for sure. I got a full flash site few month ago and it was terrible. No serach engine found anything. 2bad if you dont know browsers cant read flash XD
This is the latest and hottest ghd styler ever. If you need a ghd hair straighteners, this is a must buyghd hair straighteners,cheap ghd hair straighteners,pink ghd hair straightenersghd straightenersComme vous pouvez le voir, il s’agit d’une paire de chaussures shox classiquepink ghd hair straighteners . Si vous souhaitez poursuivre la mode, nike shox NZ sont votre meilleu…nike tnCette paire de Nike Shox Torch est chaud en maintenant la demande.nike chaussurestn chaussures