It’s nearly a decade since W3C produced the first XHTML standard. In all
that time, very few sites adopting it have gone as far as to serve the preferred MIME type (
This is because it has been difficult to do well, and
text/html sort-of works, so most website
administrators don’t bother. Here are some tips to make things easier.
First of all, this article isn't about whether XHTML is a Good Thing - personally, I take it for granted that it is, but I merely note in passing that a minority of people disagree. Rather, I'm assuming the adoption of XHTML and instead discuss how we want to move on from serving it as `text/html` to get the benefit of XML by serving it as `application/xhtml+xml` instead (good introductions: W3C.org: Serving XHTML 1.0; XML.com: The Road to XHTML 2.0 - MIME Types). Also, if you go for the slimmed-down XHTML 1.1, you really ought to use `application/xhtml+xml`. Remember when using XHTML you must be aware of the perils of using XHTML properly and take action as needed; it's not actually hard to do.
When I set about researching this topic, I found that only one existing article covers the topic broadly (MIME Types and Content Negotiation) and this only provides an overview. There are two broad approaches used to change the MIME type:
- Use one of Apache’s features to detect the browser’s capability
- Use server-side scripting - typically PHP - to detect the browser’s capability
Whilst a lot has been written on using PHP, this article instead is concerned with using Apache’s features.
There are some good articles on using
mod_rewrite to change the MIME type, and some good articles on using
content negotiation for language switching, but none describe in detail using content negotiation for the purpose of
switching MIME type. Almost all focus has been on using
mod_rewrite. Although this isn’t a problem,
arguably Apache’s content negotiation feature is more appropriate. It provides an alternative to using
mod_rewrite can be used for other things it’s good at, without it needing the added complexity of
How does content negotiation work? In summary, your browser
requests include several ‘
Accept...’ header fields to tell the server what it accepts. The server sends
back things it thinks the browser would be best able to display.
In fact, Apache supports ‘server-driven’ content negotiation as defined in the HTTP/1.1 specification, fully supporting the Accept, Accept-Language, Accept-Charset and Accept-Encoding request headers. My Firefox browser just made a request including these headers:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.7,cy;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.</pre>
For the sake of this particular discussion, we’ll only consider
the first of these:
Accept header states that my Firefox supports
text/html with an implied
quality of 1.0,
application/xhtml+xml with the same implied quality but second preference,
with the lower preference of 0.9 and finally anything else (
*/*) with quality 0.8.
How Do Various Browsers Compare?
If we want to use server-driven content negotiation, we need to be sure it will work in practice. So several browser were checked and these headers were seen:
|Amaya||Amaya 11.1||*/*;q=0.1, image/svg+xml, application/mathml+xml, application/xhtml+xml|
|Gecko||Epiphany 2.22||text/html, application/xhtml+xml, application/xml;q=0.9, */*;q=0.8|
|Gecko||Firefox 3.0||text/html, application/xhtml+xml, application/xml;q=0.9, */*;q=0.8|
|Trident||Internet Explorer 6||*/*|
|Trident||Internet Explorer 7||*/*|
|Trident||Internet Explorer 8||*/*|
||text/html, image/jpeg;q=0.9, image/png;q=0.9, text/*;q=0.9, image/*;q=0.9, */*;q=0.8|
|Built-in||Lynx 2.8.5||text/html, text/plain, text/css, text/sgml, */*;q=0.01|
|Presto||Opera 9.64||text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1|
||text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png,*/*;q=0.5|
|WebKit||Safari 528.16||application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png,*/*;q=0.5|
Generally, it’c obvious from the table that the layout engine dominates the accepted types. For example, Epiphany and Firefox are both based on the Gecko engine and accept exactly the same. Likewise Chrome and Safari are almost the same, and share the WebKit engine.
Internet Explorer and Elinks blandly say they support everything. I suppose as a rough approximation this might be nearly true if it is interpreted as ‘having a go’ at any type of content but it does seem rather unlikely that every kind of content will be handled. Nevertheless the HTTP standard allows it and that’s what these browser use. Amaya is a bit more honest about the ‘have a go’ at anything stance: it states “/;q=0.1” which, in English, means it will have a go at anything but with only a 0.1 (ie. 10%) preference for unknown content compared with its preferred list of alternatives.
This information tells us IE accepts anything, but Microsoft have stated that IE doesn’t yet officially handle XHTML. So does this prevent us from serving XHTML via content negotiation? Does IE make it hopeless to attempt to select content based only on the ‘Accept’ header because we know IE can’t accept XHTML even though it says it accepts anything and everything?
Well, no, fortunately. Investigation suggests that IE does not actually let us down. In a short while, we’ll look at the results of a compatibility test. But first, having heaved a sigh of relief, let’s consider how we can actually set up Apache to do content negotiation.
What we have to do is quite simple. There are three steps: set up
our web pages; set some AddType directives, and alter the DirectoryIndex directive. I’m assuming you’re at ease with
HTML page creation, but we go one step further: for every HTML page, we also create an XHTML page. This might sound
like we’re wasting space, but it won’t if you have a Linux server. You simply make all your pages valid XHTML
(important!), and then give each file two different names. Linux makes this easy: you use the ‘
to make symbolic or ‘hard’ links just like this:
for f in \*.html; do ln $f $(basename $f .html).xhtml; done
This command creates a ‘hard’ link
something.xhtml for every file called
something.html. You could use symlinks if you prefer
ln -s’). You could also just copy the files (via ‘
cp’ on Linux) but that will use up more
disk space. Of course, ‘
ln’ could be used either way round: if you start with the XHTML files, you can
ln’ them to make identical HTML files. Either way, we end up with each XHTML file having an
identical HTML file with an identical filename except for the extensions, which are .html and .xhtml.
OK so let’s have a look at the Apache directives. With the AddType directive, you can map a given
filename extensions onto the specified content type. We’ll use two of them: the first one maps our XHTML content
application/xhtml+xml, to all files with filename ending .xhtml. The second one maps old-fashioned
text/html to files with filenames ending in .html, but it does so with a lowered preference of 70%. This
means that our default preference is for the server to serve XHTML, whereas HTML is served as second-best. Here are
the two directives:
AddType application/xhtml+xml .xhtml AddType text/html;q=0.7 .html
Our other Apache directive makes a small adjustment to the directory indexing so that Apache can choose what to do when the user asks for a URL ending with ‘/’. We want either the index.xhtml or index.html file to be chosen according to the content negotiation. Apache makes this simple using DirectoryIndex:
is all you need to configure Apache to do just this. You may have seen the DirectoryIndex directive used with a
list of alternatives; that would do ok, but it’s simpler and clearer just to specify ‘
OK, so now we have three Apache directives set up and all our pages exist in both HTML and XHTML form. Now for some testing.
The next table shows the same browsers we looked at earlier, but this time with the outcome of a compatibility test.
|Amaya 11.1||*/*;q=0.1, image/svg+xml, application/mathml+xml, application/xhtml+xml||yes (incorrect layout)|
|Elinks 0.11.1||*/*||yes, only as text|
|Epiphany 2.22||text/html, application/xhtml+xml, application/xml;q=0.9, */*;q=0.8||yes|
|Firefox 3.0||text/html, application/xhtml+xml, application/xml;q=0.9, */*;q=0.8||yes|
|Internet Explorer 6||*/*||yes (incorrect layout)|
|Internet Explorer 7||*/*||yes (slightly incorrect layout)|
|Internet Explorer 8||*/*||yes|
||text/html, image/jpeg;q=0.9, image/png;q=0.9, text/*;q=0.9, image/*;q=0.9, */*;q=0.8||yes|
|Lynx 2.8.5||text/html, text/plain, text/css, text/sgml, */*;q=0.01||yes, only as text|
|Opera 9.64||text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1||yes|
||text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png,*/*;q=0.5||yes|
|Safari 528.16||application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png,*/*;q=0.5||yes|
The right-hand column shows my observations. Until I ran this test and saw the results, I had been worrying that Internet Explorer would somehow let me down. We know it doesn’t support XHTML fully, unlike the Gecko and WebKit browsers, and we also know it doesn’t tell the server what it wants very clearly: / is the best it can do. So there was reason for concern that any attempt to do content negotiation would fail.
But I was pleased to learn that IE happily receives XHTML and displays the pages just as normal (there are a few layout and CSS problems - but that’s a different story). Of the other browsers, only Lynx explicitly doesn’t handle XHTML - Apache negotiates the content seamlessly and we see the HTML page displayed correctly (albeit as text only in Lynx’s case).
This investigation of server-side content negotiation has shown that there is a
practical way to configure Apache to serve XHTML using its preferred
application/xhtml+xml MIME type. All
major browsers behave well and the solution is relatively painless to apply. There is no need to use
mod_rewrite - this is handy if you’re already using
mod_rewrite for some other purpose.
Footnote - A Boost for Performance
I heartily recommend Firefox with the Firebug and YSlow plugins. I’ve learnt a lot from YSlow about making my websites load more slickly. I can also recommend Charles Proxy as a diagnostic tool for undertaking this sort of investigation.
Appendix - The Final Configuration
Below I’ve listed a virtual host configuration illustrating the points made above.
- Apache 2.2 docs: Content Negotiation; mod_negotiation; mod_rewrite
- mod_rewrite: Serving XHTML Correctly Using Apache; CodeSnippets mod_rewrite rules to serve application/xhtml+xml; Tip: Configure Apache to send the right MIME type for XHTML - IBM DeveloperWorks
- Setting headers in PHP: http://www.codingforums.com/showpost.php?p=445540 Serving XHTML with the correct mime type using PHP
- Configuring Apache for Maximum Performance
- application/xhtml+xml from a static file
- W3C: XHTML Media Types