XHTML versus HTML

From the start of my “awakening” to the knowledge of web standards, accessibility, and the like, I’ve been coding in XHTML. I started with XHTML 1.0 Transitional, and then as I got more into web standards, XHTML 1.0 Strict. I treat this issue of web standards, accessibility, and validity more and more importantly as time goes by. But the question is, is XHTML is correct choice, or was HTML the right document type definition I should have been using all along?

Gasp! I can hear it now. I’ve been an avid XHTML supporter for quite some time, making sure my scripts chug out valid XHTML markup. A lot of my online friends feel the same about XHTML. But I’ve been hearing more and more about the arguments for and against XHTML, that I’ve decided to sit down and really think about it.

Here are my reasons for choosing XHTML waaaay back when, and what makes these reasons invalid:

  1. False: HTML is parsed as tag soup, but since XHTML should be valid when parsed, it should be parsed faster and better, not in “quirks mode”.

    Unfortunately, almost all usage today of XHTML is as HTML, and not as XML. This means that they are parsed as “tag soup”:

    “XHTML is an XML format; this means that strictly speaking it should be sent with an XML-related media type (application/xhtml+xml, application/xml, or text/xml). However XHTML 1.0 was carefully designed so that with care it would also work on legacy HTML user agents as well. If you follow some simple guidelines, you can get many XHTML 1.0 documents to work in legacy browsers. However, legacy browsers only understand the media type text/html, so you have to use that media type if you send XHTML 1.0 documents to them. But be well aware, sending XHTML documents to browsers as text/html means that those browsers see the documents as HTML documents, not XHTML documents.” XHTML Frequently Asked Questions

    If you want your XHTML to be parsed as XML (and take advantage of the marginally-faster parser… and when they say “marginally”, apparently it is “marginally”!), you have to send it as XML. Unfortunately IE doesn’t support that — you will get a document tree instead of your website layout, unless you give it extra instructions to do so. (See here.) And since IE is still the dominating browser around (yes, I know you’re annoyed), one just can’t ignore it.

    If XHTML is parsed the way it should be parsed (as XML), once your document is found to be not well-formed, the browser is supposed to choke and stop parsing. Period.

    “To minimize the occurrence of nasty surprises when parsing the document, XML user agents are told to not be flexible with error handling: if a user agent comes upon a problem in the XML document, it will simply give up trying to read it. Instead, the user will be presented with a simple parse error message instead of the webpage. This eliminates the compatibility issues with incorrectly-written markup and browser-specific error handling methods by requiring documents to be “well-formed”, while giving webpage authors immediate indication of the problem. This does, however, mean that a single minor issue like an unescaped ampersand (&) in a URL would cause the entire page to fail, and so most of today’s public web applications can’t safely be incorporated in a true XHTML page.” Beware of XHTML

  2. False: HTML is so old-school, it’s getting deprecated.

    Apparently not. I’ve heard of HTML 5 for a while now, but only fully realized recently what this means. The W3C renewed the HTML working group, and apparently, web browsers have leaned more towards HTML5 than XHTML2.

    Even more shocking, XHTML2 is not backwards-compatible!

    “XHTML 1.x is not “future-compatible”. XHTML 2, currently in the drafting stages, is not backwards-compatible with XHTML 1.x. XHTML 2 will have lots of major changes to the way documents are written and structured, and even if you already have your site written in XHTML 1.1, a complete site rewrite will usually be necessary in order to convert it to proper XHTML 2. A simple XSL transformation will not be sufficient in most cases, because some semantics won’t translate properly.

    “HTML 4.01 is actually more future-compatible. An HTML 4.01 document written to modern support levels will be valid HTML 5, and HTML 5 is where the majority of attention is from browser developers and the W3C.”

    Beware of XHTML

These are the most important arguments, arguments that I can’t ignore as a web developer. Obviously, the Beware of XHTML document is a good read, gives both the myths and benefits of using XHTML. What’s even more important is that the way XHTML is used, it’s just like “the new HTML”, when it shouldn’t be that way. XHTML is XML, and should be treated as XML. The extension shouldn’t be .html. Browsers should “give up” when there’s an error, and not try to repair the document — after all, that’s what browsers do with ill-formed XML documents, right?

What doctype declaration do you use? And why?

9 comments

  • I use XHTML 1.1.

    I prefer XHTML over HTML because I’m used to XML, and it’s much easier to conform to a single coding standard than two.

    I prefer Strict over Transitional doctypes because the tags and attributes that are allowed under transitional doctypes are ones I don’t use anyway (http://24ways.org/2005/transitional-vs-strict-markup).

    XHTML 1.1 does not provide much in the way of semantics over 1.0 strict (http://www.w3.org/TR/xhtml11/changes.html), but it is designed to support namespaces, which appears to be one of the paths XHTML is going down.

  • Hello Alex, sorry for the delay in replying, I was out of town :)

    If XHTML2 is not backwards compatible with anything, then XHTML1.x will be forward compatible only up to a point. If XHTML1.x is not usable at the moment, why should I use it if I will need to recode anyway in order to use the benefits of XHTML2, if I do need it in the hazy future, other than using converters?

    I’m interested in what you say about XHTML rendering faster than HTML in browsers — do you have any links for these studies?

  • Hi Angela,

    XHTML2 is not backwards compatible with anything. That does not mean XHTML1.0 is not forwards compatible. XHTML1.0 is compatible with XHTML1.1, for a start, which is supposed to be served as application/xml+xhtml and benefits from faster rendering etc. Of course it isn’t usable at the moment.

    By maintenance I suspect we’re talking about different things. Converters can be more easily written for XHTML (XML based) to other XML based documents than from HTML ( SGML based) document to an XML document format. Furthermore XHTML can use XQuery, XPath and other such emerging technologies that will sooner or later be very useful tools.

    As I said above, XHTML1.0 can be served as application/xml+xhtml so if IE9 turns out to be capable of rendering XML applications you will be able to immediately take advantage of that. Or not. It’s all about maximising compatibility and choice. That’s why I chose to use XHTML1.0

    Alex

  • Hi Alex,

    Thanks! However, the issue seems to be that XHTML 1.0 is not forward-compatible, and there will be significant changes to the specifications once XHTML 2.0 rolls around. In Beware of XHTML this is included:

    XHTML 1.x is not “future-compatible”. XHTML 2, currently in the drafting stages, is not backwards-compatible with XHTML 1.x. XHTML 2 will have lots of major changes to the way documents are written and structured, and even if you already have your site written in XHTML 1.1, a complete site rewrite will usually be necessary in order to convert it to proper XHTML 2. A simple XSL transformation will not be sufficient in most cases, because some semantics won’t translate properly.

    This is a big issue, and I don’t see the reason to work with XHTML 1.0 now when I will have to rework everything over again for XHTML 2.

    Your argument regarding “what do you want from your documents? do you want ease of maintenance?” seems to imply that coding in XHTML1 makes for easier maintenance, when I haven’t observed any better gains from when I was coding HTML and XHTML, when I sit down and think about it. That argument doesn’t make sense.

  • Ultimately you must ask yourself what you want from your documents. Do you want ease of maintenance?
    Do you want to be able to convert effortlessly between their current format and future XML formats (OpenDocument Format for example)?
    When XHTML finally comes of age and IE finally gets support, do you want to be able to simply alter one line in one file to take advantages of faster rendering?

    The argument that XHTML is not yet useful so we should continue to use HTML doesn’t really make sense. XHTML1.0 can be rendered as HTML. HTML4, however, cannot be rendered as XHTML.

  • I chose XHTML 1.0 for the exact reasons you chose, Angela, actually. After reading your post, I will take a serious look at which is the correct doctype to use. Interesting.

  • @Ilona: Yes, that’s another reason why I code in XHTML as well; seems like pretty much everyone in the fanlistings community codes in XHTML! Which presents a problem when it comes to Enth 4.0… about which to use. :P

    @Roberto: I’ve seen the Hixie advocacy text, but not the one in Coding Paradise! Thanks for sharing — those are important points and as you get deeper into web development, you end up needing to make these decisions.

  • Most of my fanlistings are coded in XHTML 1.0 Transitional. I don’t know why, exactly, I chose to code in that doctype. I guess because I thought it was cleaner and looks tidier. Maybe because everyone was switching to XHTML that I followed the trends.

«