“Reference to undefined entity” error in XML file

Published in 2007.

“Reference to undefined entity or XML Parsing Error: undefined entity seems to “be a very common error that periodically smashes someone’s site Atom or RSS “feed. And here is my short investigation of this issue.

First of all, what we are calling entity is a special construction in XML documents (site feeds are XML documents): group of chars that begins with “&” and ends with “;” with alphanumeric characters in-between. So, for example, “ “, “<“, and other similar things you may know about are entities.

In regular XML file only a few entities are allowed: “<“, “>“, “"“, “&” and “'“.

Additional entities can be defined and loaded and they, actually, are loaded for XHTML and HTML. In other words, when you specify the DOCTYPE of the document (like in the sample), XML parser becomes informed what to do with “ä“. To simplify our life, browsers load these entities automatically for HTML documents even with omitted DOCTYPE declaration.

But not for XML documents.

Sample of well-formed XHTML with DOCTYPE declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>Sample XHTML page</title>
 <body><h1>Sample XHTML page</h1></body>

As mentioned above, RSS feed is an XML document. Moreover, this document is a result of joining multiple HTML/XHTML fragments into one file and these fragments may contain entities that are defined in HTML/XHTML but not in RSS feed and these entities cause XML parsing errors like:

You may think that if entities are added into XHTML documents they can be added to XML feed too. Good idea but it will add to feed for about 30Kb of DOCTYPE definition. If it is OK, then only what you need to do is to download the following plain text files:

Merge their content into one file and add the following DOCTYPE declaration before the XML of your feed:

<!DOCTYPE feed [
 paste the content of xhtml-lat1.ent, 
 xhtml-symbol.ent and xhtml-special.ent here ]>
<feed xmlns="http://www.w3.org/2005/Atom">
 feed body

If you are familiar with DTD, you may know that entities can be loaded. But the only problem is that Mozilla (and FireFox) does not load external entities from the web.

And the better solution is to use Unicode and avoid readable entities in XHTML documents. Use plain Unicode characters or their numerical form. In other words, your documents should contain “&#160;” instead of “&nbsp;“.