XHTML versus HTML

22nd March 2008

From the start of my “awakening” to the knowledge of web standards, accessibility, and the like, I’ve been coding in XHTML. I started with XHTML 1.0 Transitional, and then as I got more into web standards, XHTML 1.0 Strict. I treat this issue of web standards, accessibility, and validity more and more importantly as time goes by. But the question is, is XHTML is correct choice, or was HTML the right document type definition I should have been using all along?

Gasp! I can hear it now. I’ve been an avid XHTML supporter for quite some time, making sure my scripts chug out valid XHTML markup. A lot of my online friends feel the same about XHTML. But I’ve been hearing more and more about the arguments for and against XHTML, that I’ve decided to sit down and really think about it.

Here are my reasons for choosing XHTML waaaay back when, and what makes these reasons invalid:

False: HTML is parsed as tag soup, but since XHTML should be valid when parsed, it should be parsed faster and better, not in “quirks mode”.

Unfortunately, almost all usage today of XHTML is as HTML, and not as XML. This means that they are parsed as “tag soup”:

“XHTML is an XML format; this means that strictly speaking it should be sent with an XML-related media type (application/xhtml+xml, application/xml, or text/xml). However XHTML 1.0 was carefully designed so that with care it would also work on legacy HTML user agents as well. If you follow some simple guidelines, you can get many XHTML 1.0 documents to work in legacy browsers. However, legacy browsers only understand the media type text/html, so you have to use that media type if you send XHTML 1.0 documents to them. But be well aware, sending XHTML documents to browsers as text/html means that those browsers see the documents as HTML documents, not XHTML documents.” XHTML Frequently Asked Questions

If you want your XHTML to be parsed as XML (and take advantage of the marginally-faster parser… and when they say “marginally”, apparently it is “marginally”!), you have to send it as XML. Unfortunately IE doesn’t support that — you will get a document tree instead of your website layout, unless you give it extra instructions to do so. (See here.) And since IE is still the dominating browser around (yes, I know you’re annoyed), one just can’t ignore it.

If XHTML is parsed the way it should be parsed (as XML), once your document is found to be not well-formed, the browser is supposed to choke and stop parsing. Period.

“To minimize the occurrence of nasty surprises when parsing the document, XML user agents are told to not be flexible with error handling: if a user agent comes upon a problem in the XML document, it will simply give up trying to read it. Instead, the user will be presented with a simple parse error message instead of the webpage. This eliminates the compatibility issues with incorrectly-written markup and browser-specific error handling methods by requiring documents to be “well-formed”, while giving webpage authors immediate indication of the problem. This does, however, mean that a single minor issue like an unescaped ampersand (&) in a URL would cause the entire page to fail, and so most of today’s public web applications can’t safely be incorporated in a true XHTML page.” Beware of XHTML
False: HTML is so old-school, it’s getting deprecated.

Apparently not. I’ve heard of HTML 5 for a while now, but only fully realized recently what this means. The W3C renewed the HTML working group, and apparently, web browsers have leaned more towards HTML5 than XHTML2.

Even more shocking, XHTML2 is not backwards-compatible!

“XHTML 1.x is not “future-compatible”. XHTML 2, currently in the drafting stages, is not backwards-compatible with XHTML 1.x. XHTML 2 will have lots of major changes to the way documents are written and structured, and even if you already have your site written in XHTML 1.1, a complete site rewrite will usually be necessary in order to convert it to proper XHTML 2. A simple XSL transformation will not be sufficient in most cases, because some semantics won’t translate properly.

“HTML 4.01 is actually more future-compatible. An HTML 4.01 document written to modern support levels will be valid HTML 5, and HTML 5 is where the majority of attention is from browser developers and the W3C.”

Beware of XHTML

These are the most important arguments, arguments that I can’t ignore as a web developer. Obviously, the Beware of XHTML document is a good read, gives both the myths and benefits of using XHTML. What’s even more important is that the way XHTML is used, it’s just like “the new HTML”, when it shouldn’t be that way. XHTML is XML, and should be treated as XML. The extension shouldn’t be .html. Browsers should “give up” when there’s an error, and not try to repair the document — after all, that’s what browsers do with ill-formed XML documents, right?

What doctype declaration do you use? And why?