XML Tutorial

XML External Entities

External Entities

External entities offer a mechanism for dividing your document up into logical chunks. Rather than authoring a monolithic document, a book with 10 chapters, for example, you can store each chapter in a separate file and use external entities to "source in" the 10 chapters.

Because external entities in different documents can refer to the same files on your file system, external entities provide an opportunity to implement reuse. Reuse of small, discrete components (figures, legal boilerplate, warning messages) is fairly easy to manage. Implementing reuse on a large scale requires an entity management system which XML, by itself, does not provide.

A few notes about external entities

  • External entities do not have to consist of a single element; you can make a sequence of three paragraphs, or even a bunch of character data with embedded inline markup into an external entity. But the tags in an external entity must be well balanced (you can't start a tag in an entity and end it in your document or in another entity).
  • External entities can reference internal or other external entities, but you cannot have circular references.
  • You can refer to the same external entity several times in a single document. Note, however, that if you do this, you will have to avoid using ID attributes in the external entity if you're concerned about validity. Using an external entity which contains an ID in more than one location in your document will produce a document that has multiple, duplicate IDs which is a validity error.
  • It is legal to have several external entities that all refer to the same external file.
  • There are no additional restrictions placed on the character encodings used by external entities. In particular, external entities with differing encodings can be used in the same document.

Declaring External Entities

External entity declarations come in two forms. If the external entity contains XML text, the declaration has the following form:

<!ENTITY <i>entityname</i>
[PUBLIC "<i>public-identifier</i>"]
SYSTEM "<i>system-identifier</i>">

The system identifier must point to an instance of a resource via a URI, most commonly a simple filename. The public identifier, if supplied, may be used by an XML system to generate an alternate URI (this provides a handy level of indirection on systems that support public identifiers).

An external entity that incorporates chap1.xml into your document might be declared like this:

<!ENTITY chap1 SYSTEM "chap1.xml">

Despite the growing trend to store everything in XML, there are some legacy systems that still store data in non-XML formats. Graphics are sometimes stored in odd formats like PNG and GIF, for example ;-).

External entities that refer to these files must declare that data they contain is not XML. They accomplish this by indicating the format of the external entity in a notation:

<!ENTITY <i>entityname</i>[<span class="OPTIONAL">PUBLIC "<i>public-identifier</i>"</span>]
SYSTEM "<i>system-identifier</i>"<i>notation</i>>