XML Tutorial

XML processing instructions (PI), comments, whitespaces

Introduction

Processing instructions are used to provide information to the application processing an XML document. Such information may include instructions on how to process the document, how to display the document, and so forth. Processing Instructions can appear as children of elements. They can also appear as top-level constructs (children of the document) either before or after the document element.

Processing instructions are composed of two parts: the target or name of the processing instruction and the data or information. The syntax takes the form <target data> . The target follows the same construction rules as for element and attribute names. Apart from the termination character sequence ( ?>), all markup is ignored in processing instruction content. Processing instructions defined by organizations other than the World Wide Web Consortium (W3C) may not have targets that begin with the character sequence xml or any recapitalization thereof.

Namespace declarations do not apply to processing instructions.Thus, creating targets that are guaranteed to be unique is problematic.

Example of Processing instructions:

<?display table-view?>
<?sort alpha-ascending?>
<?textinfo whitespace is allowed ?>
<?elementnames <fred>, <bert>, <harry> ?>

Comments

XML supports comments that are used to provide information to a human about the actual XML content. They are not used to encode actual data. Comments can appear as children of the elements. They can also appear as top-level constructs (children of the document) either before or after the document element. Comments begin with the character sequence and end with the character sequence. The text of the comment is serialized between the start and the end sequences. The character sequence -- may not appear inside a comment. Other markup characters such as less than, greater than, and ampersand (&), may appear inside comments but are not treated as markup. Thus, entity references that appear inside comments are not expanded.

Example of legal comments:

<!-- This is a comment about how to open ( 
<![CDATA[ ) and 
close ( ]]> ) CDATA sections -->
<!-- I really like having elements called <fred> in my 
markup languages -->
<!-- Comments can contain all sorts of character literals
including &, <, >, ' and". -->
<!-- If entities are used inside comments ( &lt; for 
example ) they are not expanded. -->

Example of illegal comments:

<!-- Comments cannot contain the -- 
character sequence -->
<!-- Comments cannot end with a hyphen --->
<!-- Comments cannot <!-- be nested --> -->

Whitespace

Whitespace characters in XML are space, tab, carriage return, and line feed characters. XML requires that white space to be used to separate attributes and namespace declarations from each other and from the element tagname. Whitespace is also required between the target and data portion of a processing instruction and between the text portion of a comment and the closing comment character sequence (-->) if that text ends with a hyphen (-). XML allows whitespace inside element content, attribute values, processing instruction data, and comment text. Whitespace is also allowed between an attribute name and the equal character and between the equal character and the attribute value. The same is true for namespace declarations. Whitespace is allowed between the tag-name of an open or close tag and the ending character sequence for that tag. Whitespace is not allowed between the opening less-than character and the element tagname or between the prefix, colon, and local name of an element or attribute. Nor is it allowed between the start processing instruction character sequence <? and the target.

Example of correct use of whitespace:

<pre:Vehicle xmlns:pre='urn:example-org:Transport'
type='car' >
<seats> 4 </seats>
<colour> White </colour>
<engine>
<petrol />
<capacity units='cc' >1598</capacity>
</engine >
</pre:Vehicle >

Whitespace used in various places in an XML document: between the tagname, namespace declaration, attribute, and closing greater than character on the top-level element start tag, between each element, in the character content of the seats and colour elements, between the tagname and the />sequence of the petrol element, between the tagname and the closing greater-than character of the end tag for the engine element and the top-level element .Example of incorrect use of whitespace:

<pre :Vehicle xmlns:pre='urn:example-org:Transport'
type='car'>
< seats>4</ seats>
</pre:Vehicle>

Whitespace used incorrectly in various places in an XML document: between pre and :Vehicle in the start tag of the top-level element, between xmlns: and pre of the namespace declaration of the top-level element, between the opening less-than character and seats in the start tag of the child element, and between </ and seats in the end tag of the child element.