XML Files

From HLKitWiki
Jump to: navigation, search

Context: HL KitBasic Concepts and Terminology 

Overview

All HL data files are stored as XML files, as are saved portfolios and most other files intended for user access. Consequently, you'll need to be familiar with the structure of these files in order to manipulate them appropriately. This section outlines the details you'll need to know for this purpose.

If you are not already familiar with XML, it is quite easy to learn, since XML uses simple text files that can be easily created or modified. For additional information on XML, there have been countless books published on the topic and there are extensive resources available on the internet. The official site can be found at the following link. http://www.w3.org/XML/

HL utilizes only the basic mechanisms of the XML standard, so HL files are quite simple to work with. Since XML and HTML both derive from the same set of standards, anyone even tacitly familiar with HTML will be able to pick up XML very readily – at least to the complexity level employed by Hero Lab (or the lack of complexity, actually).

There are numerous commercial, shareware, and freeware tools available for easily editing XML files. Each has advantages and disadvantages, so you will need to make the determination of which tool is "better" for yourself. It's also perfectly reasonable to edit XML files with a simple text editor, although you'll want an editor that at least offers line numbers, since errors are reported with line numbers to allow easy correction of problems.

Basic XML Terminology

You are assumed to be familiar with XML before attempting to write data files for HL. However, it's quite likely that you may be reviewing this documentation before deciding whether you want to try your hand at writing data files, in which case you might not know XML yet. So we've provided very basic definitions of a few fundamental XML terms below to help you better understand the Kit documentation.

Element HL data files are comprised of XML elements that define all of the information for a particular game system.
Attribute When creating data files, almost all information is conveyed through the use of attributes within the XML file format. Each XML element contains an assortment of zero or more attributes, where each attribute defines a specific piece of information about the element.
PCDATA When a block of free-form text is required for an element within Hero Lab data files, that text is specified via the use of XML PCDATA. When using PCDATA, remember that you must enclose the entire text within a CDATA block as a wrapper if you utilize any of the XML reserved characters (see below).
CDATA If you need to include special formatting and/or reserved XML characters within a PCDATA region, you will want to wrap your text within a CDATA block. A CDATA block simply prepends the text with the character sequence "<![CDATA[" and terminates the block with the sequence "]]>". The list of reserved XML characters is defined further below.
DTD Every XML file must be assigned a formalized structure for its contents. The formal specification of an XML file's structure is via a DTD (short for Data Type Definition). An appropriate DTD should be included with this documentation for every public HL file format.

Reserved XML Characters

The XML language has a number of reserved characters that have special meaning. If you need to use these characters within your data files, an appropriate replacement must be used in accordance with the XML language specification. Alternately, a CDATA block can be used within a PCDATA region. These special characters are documented in detail within any reference on XML, but they are repeated below for convenience. Any time you need to use one of the characters below within HL files, use the corresponding character sequence given on the right.

< &lt;
> &gt;
& &amp;
" &quot;
' &apos;
literals If you need to specify a character with a code of 128 or higher, you need to specify the character as a literal. The syntax for this is "&#ddd;", where "ddd" is the decimal value of the character code (e.g. "&#149;"). You can also specify hexadecimal values via the syntax "&#xdd;" (e.g. "&#xAE;").

XML Comments

If you are editing data files by hand, then you can freely insert comments into the XML data files using standard XML syntax. Comment blocks begin with the character sequence "<!––" and end with the sequence "––>". Any number of lines of text with any contents can appear within an XML comment block. Comments within XML files may not be nested.

For example, the following XML includes a comment that effectively omits the "dropped" element from the document, including all of its attributes and the "dropchild" child element as well.

<document>
  <first attrib="value"/>
  <second attrib="value" another="junk">
    <child>This is PCDATA</child>
<!--
  <dropped attr1="x" attr2="y" attr3="z">
    <dropchild attr="ignore"/>
    </dropped>
-->
  <third dummy="nothing">
  </document>

NOTE! If you create a data file outside of HL and then use HL's integrated Editor to edit the file, all comments will be thrown away by the Editor. When using XML comments, be sure to only edit those files within tools that will preserve the comments.

XML Character Encoding Sets

The XML specification identifies a number of character sets that can be utilized within a given document. Unfortunately, none of them fully support the Windows ANSI character set, and HL is a Windows application. HL assumes all XML documents subscribe to the XML character encoding set that most closely approximates Windows ANSI, and all characters within that set are assumed to be the corresponding Windows ANSI characters. This means that HL assumes all XML documents utilize the "ISO-8859-1" character set (more commonly referred to as Latin-1), with a number of exceptions that are detailed in the Kit Reference section of the documentation.

The identity element at the top of all XML files should specify an encoding of "ISO-8859-1" for completeness. If no encoding is given, ISO-8859-1 is assumed. An example is given below:

<?xml version="1.0" encoding="ISO-8859-1" ?>

NOTE! There is an unofficial XML encoding named "Windows-1252" that properly reflects the Windows ANSI character set and is often used. However, various XML parsers do not recognize this encoding set due to its unofficial nature. In the interest of maximum compatibility, the modified Latin-1 set is used instead.