molly.com
Sunday 4 December 2005
Anatomy of a Document
Here’s the deconstructed, annotated XHTML document for your education and enjoyment.
- <?xml version="1.0" encoding="utf-8"?>
-
This is the XML declaration. Its role is to declare an XML document, its version, and can also be used to describe the encoding of the document. Its use is recommended but not required by the W3C, largely due to backward compatibility concerns, as many user agents do not properly interpret the XML declaration. This can result in a number of issues: The document markup being displayed in the viewport instead of the document itself; the document rendered as an XML tree; or in the case of IE 6.0, the document remaining in Quirks Mode as anything other than white space placed above a
DOCTYPEdeclaration will prevent IE 6.0 from switching to Compliance Mode.Note: Because of the browser compatibility issues with the XML declaration, most designers and developers leave it off of documents to avoid problems.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">-
This is the
DOCTYPEdeclaration.DOCTYPEdeclarations are bits of SGML that describe the document type, including where the Document Type Definition (DTD) for the document is published and by whom.DOCTYPEdeclarations are required in HTML 4.x and XHTML 1.x in order for the document to be compliant. With the exception of DOCTYPE Switching
DOCTYPEdeclarations are passive until the document is tested for conformance with validation, at which point the validator uses the information within theDOCTYPEdeclaration to compare the document to the DTD with which it claims to be associated.Note: The XML declaration plus the
DOCTYPEdeclaration make up what is known as the XML prolog. - <html xmlns="http://www.w3.org/1999/xhtml">
-
The opening tag of the root element
htmland the XML namespace attribute. In HTML and XHTML documents, the root element ishtml. This means thathtmlis the grand ancestor: It has no ancestors, only children and descendants. The namespace attribute further describes which markup language or languages are being used in the document. Despite the fact that many validators will validate an XHTML document that has no namespace attribute, it should always be present.Interestingly, many designers and developers are unaware that the
htmlelement can be styled, and is especially useful when creating a site signature or other means of page identification as well as implementing inherited styles from the top of the tree on down. - <head>
-
The opening tag for the
headportion of the document. Theheadcontains meta information about the document, style, and script information. Theheadis a required part of the document in XHTML 1.x. - <title>Sample XHTML Document</title>
-
The
titleelement including the opening and closingtitletags and the title text, which displays in the title bar of the browser. Properly written title text is important as it aids in usability, accessibility and search engine optimization.Note: The
titleelement is required in HTML 4.x and XHTML 1.x documents - <link rel="stylesheet" type="text/css" src="css/global.css" media="all" />
-
The
linkelement is used here to link a stye sheet, define the link relationship using therelattribute, define the MIME type astext/css, point to the source of the .css document, and define the media for this style sheet as being a value ofall, for all media that supports CSS.The
linkelement is what’s referred to as an empty element or a replaced element. Empty elements do not contain text content, and typically they are replaced by something, such as an image forimg, a line break forbror a horizontal rule forhr. Empty elements are sometimes referred to as singletons because empty elements are represented as one tag,<br>. In XHTML, the empty element must be terminated with a trailing slash:<br />.Note: The space between the element name and the trailing slash is a convention, not a requirement. It’s used for backward compatibility as some browsers, including Netscape 4.x browsers, have rendering issues with the slash when it immediately follows a non-space character. With the space, older browsers properly render the markup.
- </head>
-
The closing tag for the
headelement.Note: The
headelement is made up of its opening tag, the element’s children and descendent elements, and the closing tag for theheadelement. - <body>
-
The opening tag for the body element. All subsequent elements result in the content display within the viewport.
- <h1>Header Level One</h1>
-
Structural and semantic heading element. The
h1element is the most important header on the page, is often used to identify the site by name, and is important to search engines. - <p>Paragraph of text with a relative link and an absolute link.</p>
-
Headers and paragraphs are considered non-empty elements because they contain text content. In XHTML, non-empty elements must terminate with a closing tag. HTML allows certain non-empty elements to be represented only by their opening tag, such as
pandli. Headers and paragraphs are also good examples of block elements, and the anchor element is considered to be an inline element.Note: Display properties can be modified using CSS. This is why we are able to use lists for horizontal navigation, for example.
- </body>
-
Closing tag for the
bodyelement.Note: Readers might find it of interest to know that in HTML, the
body,headandhtmlelements weren’t required! Don’t believe me? Here’s an HTML 3.2 document free of those elements - validate it for yourself. Fun! - </html>
-
Closing tag for the root element of
html
Now, if you would like to help me gather up the cited terms and begin working on this glossary, I’ll be happy to take your input in the comments.
Filed under: general
Posted by: Molly | 2:13 am |

December 4th, 2005 at 3:11 am
[…] ction, including an example of a completely valid headless, body-less, HTML 3.2 document. Continue reading . . . Filed under: professional, standards, web design Posted by: Mol […]
December 4th, 2005 at 4:41 am
Seems like this glossary may confuse some
with ‘elements’ vs. ‘tags’
Tags for HTML, HEAD and BODY are not required in HTML4.01 as well, they are present in DOM even if they are not in markup. The only element with required tags is - and this is true even for HTML4.01 strict.
Proof is in DTD and here: http://rimantas.com/bits/minimal_html.html
December 4th, 2005 at 4:43 am
oh, the only required element which must be marked up with tags is TITLE, html vas filtered in previous comment.
December 4th, 2005 at 5:28 am
A very interesting idea! Two quick questions:
1. Describing meta elements would probably open a can of worms, but what about particular ones like character encoding?
<meta http-equiv=”content-type” content=”text/html; charset=utf-8″ />
2. There are many people who could benefit from a brief description of what elements and attributes are. Is it worth placing some basic definitions of these before starting the deconstruction of a document itself?
Anyway… a good start!
December 4th, 2005 at 10:34 am
Rimantas: I’ve fixed it in the text, thank you!
Russ: That would all go into the glossary. I know it’s backwards, but that’s what happens when I’m left to myself
December 4th, 2005 at 10:35 am
What about the other tags that are optional but quite often used within documents?
Bold, Italics, Underline, Span and Javascript immediately come to mind. I think there should be a section of your document that discusses the major ones as well.
December 4th, 2005 at 12:19 pm
I think this is a really interesting project. It is so easy to let a small idea balloon into something larger. When that happens, I usually go back to the original idea and construct that, with the intention that I will add the additional functionality. This allows me to keep “shipping” product, generating interest, and gaining more real-world feedback.
This iterative approach allows your audience to grow along with your product, so when you add new capability, the learning curve is flattened. Actually, this has given me a good topic on which to write. I’m currently designing a very small web-based application that is going through this exact process. Thanks!
December 4th, 2005 at 12:23 pm
Oh, suggestion:
I’d like to see (via “title” attribute, or similar.), the definitions of ACRONYMS and ABBRs. That will make the definition commentary more valuable. (for example, SGML should tell me that it stands for “Standard Generalised Markup Language” and a blurb on what that means in english)
December 4th, 2005 at 2:53 pm
Wow, it’s about time, thanks Molly. Maybe this will stop people from saying things like “alt tag” … they know who they are.
The only thing I would change with this example is:
*h1*Header Level One*/h1*
Structural and semantic heading element. The h1 element is the most important header on the page, is often used to identify the site by name, and is important to search engines.
I would change to:
… is often used to identify the site by name or identify the title of the current page…
since many sites put the title of the page in the h1 and leave the site title out of headers for semantic and SEO reasons.
December 4th, 2005 at 6:29 pm
How about, like, for those people who actually use XHTML?
December 4th, 2005 at 6:30 pm
… <?xml-stylesheet … ?> that is.
December 4th, 2005 at 10:08 pm
Just don’t use the word “alt tag” when you get to it. You’re really going to piss off Roger (456Berea) and me.
December 4th, 2005 at 10:15 pm
From what I understand, an XHTML document should be served up with a content-type of application/xml+xhtml, no?
December 5th, 2005 at 2:06 am
Kerri:
XHTML 1.0 can be served as both text/html and application/xml+xhtml. This is explained as being due to backward compatibility for XHTML 1.0. Many people believe that using XHTML and not serving it as application/xml+xhtml is useless, but I disagree. There are other advantages to learning or using XHTML 1.0 in my opinion.
However, XHTML 1.1 should be served as application/xml+xhtml or not used.
December 5th, 2005 at 4:52 am
One more nitpicking - it’s application/xhtml+xml, not other way round.
December 5th, 2005 at 5:02 am
> Readers might find it of interest to know that in HTML, the body, head and html elements weren’t required!
For someone so interested in nomenclature and calling things by their correct name, I’m surprised that you could make such a mistake. The elements themselves are required, they are present in all documents. Their start- and end-tags, however, aren’t required, which is what you’re referring to. The same is true of tbody, but all other elements require at least a start-tag, some of which still allow optional end-tags.
Russ Weakley, the meta element for setting the character encoding or the MIME type, for that matter, is completely useless and must not be used in XHTML. It works for HTML, but it is an inferior substitue for real HTTP headers.
Of course, it does appear to have an effect in XHTML when the incorrectly served as text/html, but under XML conditions it is completely meaningless.
One more very important point about the XML declaration is that if it is omitted from the document, then the file must be encoded in either UTF-8 or UTF-16 or the encoding must be specified in a higher-level protocol, like HTTP.
December 5th, 2005 at 8:31 am
I’m trying to change the text color of my blinking, scrolling banner in frame 6 of my dog’s web page using Mikersoft Word - your little guide doesn’t help at all.
December 5th, 2005 at 8:47 am
[…]
« Silkscreening at Home
Anatomy of a Document
Anatomy of a Document - “deconstructed, annotated XHTML document for your education and enjoyme […]
December 5th, 2005 at 1:49 pm
Well, Lachlan, the entire point of this exercise is exactly that - to work on this with our collaborative knowledge and therefore have a stronger result.
It just proves a person can know a lot and know nothing. I humbly submit that would describe me perfectly
December 6th, 2005 at 1:05 am
Nice to read. Thank you!
December 6th, 2005 at 4:06 pm
Personally I think that ‘application/xhtml+xml’ mime-type and the XML declaration go hand in hand. It should be optional to use them both for XHTML1.0 but compulsary for XHTML1.1+. However you can’t really to that until all browsers accept XML properly and IE probably won’t until IE8
December 8th, 2005 at 8:46 am
[…] ndards - The title really says it all. Another wonderful article from 456 Berea Street. Anatomy of a Document - a deconstructed, annotated XHTML document. Rachel Andrew has an article at 2 […]
December 8th, 2005 at 1:18 pm
Just thought you’d like to know: the tag isn’t correct in the example above. There isn’t a src attribute; it should be href=”css/global.css”
December 9th, 2005 at 12:12 am
[…] s Abgelegt unter: HTML-CSS — Perun um 8:11 Wie ein XHTML-Dokument aufgebaut ist beschreibt Molly Holzschlag in ihrem Weblog. Stichwörter: keine « Neue Version […]
December 9th, 2005 at 2:51 am
An XHTML document can be standalone, and I wouldn’t classify the LINK as a replaced element either: IMG and OBJECT can be considered ‘replaced elements’ though.
December 9th, 2005 at 4:43 pm
[…] typography, in this following post I would like to point you to a link that elucidates the anatomy of an XHTML document. Molly is creating a human-readable glossary of XHTML and CSS terminolog […]
December 10th, 2005 at 2:06 am
Shouldn’t the lang attribute be specified in the opening html tag? At least for accessibility reasons?
December 10th, 2005 at 8:47 am
[…] Dec 10 2005 | Tagged as: Quick Posts
Molly has written s good article explaining the anatomy of an XHTML Document. Good to Learn.
[…]
December 10th, 2005 at 9:09 pm
What distinction are you making between structural and semantic when you write (about the H1) that it is a “structural and semantic heading element”?
December 11th, 2005 at 6:54 am
[…] au eines XHTML-Dokuments widmet sich auch Molly Holzschlag in ihrem aktuellen Blogeintrag: […]
December 13th, 2005 at 1:05 pm
Jay Fienberg–coming mostly from an English theory background with a dollop of web design experience, I’d say the distinction between structural and semantic is a distinction of viewership. The structural markup denotes meaning to the browser, while the semantic markup denotes meaning in the overall hierarchy of the document as seen by the user.
December 14th, 2005 at 1:15 pm
I disagree with Molly’s description of the application of the H1 tag.
WordPress blogs, because of the nature of the default templates, and possibly other blog software as well, markup the name of the blog with the H1 tag but I don’t believe that this is correct use of this tag. If you understand that the H1 tag is the top-most text-structure tag, then every article in your blog has the same heading, the name of your blog.
If you are interested, you can read more of my opinions about the H1 tag in my own article entitled The Heading 1 Challenge.
January 6th, 2006 at 9:49 pm
Good…
January 25th, 2006 at 3:32 am
Great article. I blogged it on www.thewebdesignblog.com
January 25th, 2006 at 9:19 am
[…] Molly.com explains the anatomy of a web document including what each tag is used for, what it does, and whether it’s essential or not to implement. « Notepad2 - Just as fast, but with much, much more […]
March 13th, 2006 at 2:13 am
Molly, you write:
“The opening tag of the root element html and the XML namespace attribute. In HTML and XHTML documents, the root element is html. This means that html is the grand ancestor: It has no ancestors, only children and descendants. The namespace attribute further describes which markup language or languages are being used in the document. Despite the fact that many validators will validate an XHTML document that has no namespace attribute, it should always be present.”
Unfortunately, what you say about the xmlns attribute is a widely held misconception. The xmlns attribute for the html tag in XHTML is *not* defined as REQUIRED (which would mean you have to specify it), and *not* defined as optional, by giving a DEFAULT value (which would mean it gets the default value if you don’t specify it but otherwise you may override it with a legal value), but instead as having a FIXED value. This means it *has* this value, as defined in the DTD, anyway whether you specify it or not; this also means that while it is *allowed* to specify it with the value it already has, it is *not* allowed to give it any other value - as would be suggested by specifying the attribute in the first place.
Hence it is not only entirely correct that documents not specifying it validate, it is actually *better* not to specify it in the first place to avoid the suggestion it could have any other value. It certainly should not “always be present”, as you claim.
March 31st, 2006 at 1:40 pm
[…] Tekst ten przedstawia strukturę dokumentu XHTML, wraz z opisem poszczególnych elementów. Jest to tłumaczenie ze strony molly.com. […]
July 12th, 2006 at 7:14 pm
nice!!…
September 5th, 2006 at 7:48 pm
Anatomy of a XHTML Document
Anatomy of a XHTML Document is a good, step by step introduction to what makes up a fully valid XHTML document. Right now the example is pretty basic and probably won’t help people who have spent a bit of time with XHTML but it would be a good link to…
December 16th, 2006 at 9:26 am
[…] http://www.molly.com/anatomy-of-a-document/ […]
January 1st, 2007 at 11:17 am
[…] تشريح صفحة XHTML. […]
May 2nd, 2007 at 4:32 pm
*h1*Header Level One*/h1*
Consider:
Only semantic heading element exists. The h1 element is important “header” on every website, is often used to identify the HP by name, and is very important to Google.
greetings from Kiel.
May 9th, 2007 at 1:35 pm
Cutting the swath through the tutorial with constant checking against w3c.org, then verifying/changes/re-changes … i think i know have a proper XHTML1.0 Transitional plan, but at the tail end of it, would have wished more of the blogged changes were re-integrated into the body, specifically the errors. Stimulating though…
June 23rd, 2007 at 11:20 pm
hi dear how are u do u like tellme me alt mean and full defention plz ………….. i wait u
thanks
August 26th, 2007 at 4:33 am
Hi
Need sex with local girls click here aarens dating directory online services
http://aarensdatingdirectoryonlineservic.blogspot.com
G’night
September 26th, 2007 at 4:05 am
Thank you, i think its very useful for me. h1 and Title are one of the important things as i see.
December 10th, 2007 at 6:57 am
Great boys95a4ec17fae4ff2cab5501ec4199aad7
January 28th, 2008 at 4:09 pm
I just came into the webpage today, and it made me really happy to see that. My site chanel handbags http://chanelhandbagss.forum5.com/index.php
chanel handbags
[url=http://chanelhandbagss.forum5.com/index.php]chanel handbags[/url]
June 21st, 2008 at 8:35 pm
Hi!
A friend of mine told me about that site and i am really impressed. This site was very useful to me and i just wanted to thank you for so much useful information. Greetings
Blue
July 12th, 2008 at 1:46 am
Yeah, but I’m sure that the BODY and HEAD elements are greatly required at the moment in order for all browsers to read the HTML appropriately. Perhaps it wasn’t in the old days, when there was only HTML, but these days there’s a heck of a lot more for those browsers to read and I’m sure we would absolutely have to have these tags. They probably also help in ensuring the CSS is read correctly.