"Reference to undefined entity" error in XML file

2007-01-23

"Reference to undefined entity" or "XML Parsing Error: undefined entity" seems to be a very common error that periodically smashes someone's site Atom or RSS feed. And here is my short investigation of this issue.

First of all, what we are calling entity is a special construction in XML documents (site feeds are XML documents): group of chars that begins with "&" and ends with ";" with alphanumeric characters in-between. So, for example, " ", "<", and other similar things you may know about are entities.

In regular XML file only a few entities are allowed: "<", ">", """, "&" and "'".

Additional entities can be defined and loaded and they, actually, are loaded for XHTML and HTML. In other words, when you specify the DOCTYPE of the document (like in the sample), XML parser becomes informed what to do with "ä". To simplify our life, browsers load these entities automatically for HTML documents even with omitted DOCTYPE declaration.

But not for XML documents.

Sample of well-formed XHTML with DOCTYPE declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>Sample XHTML page</title>
 </head>
 <body><h1>Sample XHTML page</h1></body>
</html>

As mentioned above, RSS feed is an XML document. Moreover, this document is a result of joining multiple HTML/XHTML fragments into one file and these fragments may contain entities that are defined in HTML/XHTML but not in RSS feed and these entities cause XML parsing errors like:


You may think that if entities are added into XHTML documents they can be added to XML feed too. Good idea but it will add to feed for about 30Kb of DOCTYPE definition. If it is OK, then only what you need to do is to download the following plain text files:

Merge their content into one file and add the following DOCTYPE declaration before the XML of your feed:

<!DOCTYPE feed [
 paste the content of xhtml-lat1.ent, 
 xhtml-symbol.ent and xhtml-special.ent here ]> 
<feed xmlns="http://www.w3.org/2005/Atom">
 feed body
</feed>

If you are familiar with DTD, you may know that entities can be loaded. But the only problem is that Mozilla (and FireFox) does not load external entities from the web.

And the better solution is to use Unicode and avoid readable entities in XHTML documents. Use plain Unicode characters or their numerical form. In other words, your documents should contain "&#160;" instead of "&nbsp;".

Comments

2011-07-23 07:56 - Phil - undefined entity and html5

I had exactly the same problem when I started playing with html5. undefined entity errors showing in the error console. With standard html5, the doctype does not include any DTD information, and (at least) FF5 appears to not load them. Was quite a puzzle why &nbsp; and &reg; were not working until I found this article. Thanks. Same general solution: encode in and specify utf-8 character set, and use the actual characters instead of the named entities.

2012-01-20 16:42 - Jason - Thanks

I was stumped until I the very last sentence. I changed the &nbps; I thought was the problem to &#160; and my problem was fixed.