Chapter 1: Introduction
What Is XML?
Extensible Markup Language (XML) is a globally accepted, vendor- independent standard for representing structured, text-based data. An XML document is a perfect medium in which to encapsulate any kind of information that can be arranged or structured in some way. For example, an XML document can contain a list of personal or business contacts, books in a library’s card catalogue, or products in a warehouse.
If we looked at any one of these examples—say, the library card catalogue—in the more traditional “table-oriented” view with which most developers would be familiar, we would see something like the following:
book_isbn book_genre Firstname middlename Lastname title
0812589041 Science Orson Scott Card Ender’s Fiction Game
0883853280 biography William Dunham Euler: The
Master of Us All
An XML document, on the other hand, would present this information hierarchically, where the column names would become tags or possibly “attributes.” For example:
<title>Euler The Master of Us All<title>
Listing 1.1 is included to give you a first look at XML, which can be overwhelming compared to a familiar table structure. However, as you become more familiar with XML, you will see that this structure has many important advantages over a traditional table.
XML and HTML
It can help to think of XML at its most basic level as being very similar to a HyperText Markup Language (HTML) web page. However, the tags in an XML document do not have a fixed meaning the way they do in HTML (e.g., <bold>, <body>, etc.) When a developer writes an XML document, he or she decides on the names of the elements (e.g., book, title, year, and author) and the data the elements will contain (e.g., the <year> tags contain the year the book was published). The developer chooses his elements with the expectation that some client application exists that will read the XML file and be written to process those particular elements in some way. Referring back to Listing 1.1, one can imagine that there is a book-search software application running on a computer in the library that reads XML files with this structure (perhaps receiving them via the Web from some central server), allowing library patrons to search for the books they wish to check out.
What Is XML Used For?
One misconception regarding XML is that it is simply an alternate way of transporting and storing data. However, that is only one small facet of how XML is used today. To give only a few examples, XML can be used to:
-invoke methods on a remote server through a firewall (this protocol is called SOAP)
-represent relational database data such that it can be easily translated into HTML, viewable by any browser without programming
-store configuration and deployment data for applications, providing operating-system-independent formats for initializationconfiguration files
-create template documents describing the various fields and attributes of a business form
XML Tools and Technologies
In spite of its power and wide range of uses, XML itself is very straightforward. The more subtle aspects of XML do not have to do with XML itself, but rather with various third-party applications and technologies such as XML editors and authoring tools, and XML-related APIs.
XML Authoring Tools
XML files can become very large and may have many layers of nested elements. While the basic grammar of XML is relatively simple, finding a deeply buried element in a large document, or resolving a missed “” or mismatched tag will very quickly try the patience of most software developers. Therefore, many tools have been developed to address this need. One can, of course, work with any simple text editor, but you can find a listing of some popular XML authoring tools in Chapter 3.
Translation and Styling
In addition to applications that make writing XML documents easier, there are a number of technologies that can actually extend XML’s capabilities. Most web browsers are capable of displaying XML files. Internet Explorer, for example, will show an XML file as a dynamic collapsible tree much like a simple Windows File Explorer; you can click on nodes to open them and reveal their child elements, or close them to get to a top-level view of the XML document. Suppose, however, you would like your XML document to display just like a web page with proper formatting and, perhaps, a colored background? There are two ways to do this:
1. Cascading Style Sheets (CSS): CSS files are text files containing format information. CSS is an older technology developed for HTML and uses a specialized scripting language.
2. Extensible Stylesheet Language Transformations (XSLT): Where CSS files are written in a specialized script language, XSLT documents are actually written in XML. An XSLT file maps XML elements into HTML tags and, in so doing, an XSLT file is used to actually translate an XML file into an HTML file.
The separation of data and presentation using either technology allows for a much cleaner and more efficient application design.
While the XML specification defines a structure for encapsulating data, it does not have any prescribed method for querying the data in an XML document. A technology known as XPath, however, does provide a mechanism for querying an XML file. If you are familiar with relational databases, you can think of XPath as XML’s much less sophisticated brand of SQL (Structured Query Language). By way of example, the XPath expression [stocksstock[1.0 > @pricetext()]] will return all penny stocks from an appropriately structured XML file.
Programming with XML
In order to read an XML document, an application must parse it. The process of parsing is complex; a parser must take a text document, cut it up into meaningful segments (while making certain that these segments are correctly formed and that they conform to the rules of the language), and store the data and elements in memory. Writing a proper parser for any language or grammar is no easy task, and XML would not have caught on as a standard unless there also existed a freely available parser with a friendly programming interface to relieve the developer of this task. Fortunately, there are two: The Simple API for XML (SAX) and the Document Object Model (DOM). Ultimately, a developer can use either API to read an XML file and extract data from it. The approach taken by each of these two APIs, however, is different, as follows:
-DOM is a passive API. DOM reads an entire XML document, creates a tree structure in memory, and gives the developer read and write access to this tree. DOM must process the entire file and bring it into memory before a developer may access it.
-SAX is an active API. SAX will actually call methods on your application (or fire events) as it moves through the XML document. You can think of SAX as adhering to an event-driven model, triggering events in your application whenever it encounters anything important your application needs to know about, such as an element, text data, etc. Note that SAX does not allow modification of the XML document.
DOM and SAX are two different APIs useful for parsing, reading, and (to a small extent in the case of DOM) manipulating XML. Each API has strengths and weaknesses that will be explained in Chapters 6 and 7.
Integrating XML with Your World
XML’s popularity is due to the sheer usefulness of the technology. By defining a standard, vendor-independent format that can represent any kind of data, the uses for XML are boundless. Database vendors have taken notice of XML and are making their systems XML-friendly.
You may recall from the earlier section in this chapter, “Translation and Styling,” that an XML file can easily be translated to HTML via XSLT. The simplicity of this translation technology makes XML an ideal way to return data from a database since it can so easily be translated into a web-based report. CSS or additional XSLT can then be used to further enhance the appearance of the HTML page. As we will see, database vendors are quick to take advantage of these capabilities.
Upcoming XML Technologies
The XML family of technologies represents a tremendously fast-moving field where products, capabilities, and interoperability change daily. New standards are on the way for areas such as vector graphics (SVG), distributed computing (SOAP), and changes to HTML (XHTML). In addition, many industries have embraced XML as a standard for communicating specific types of information. For example, the financial industry is slowly accepting FIXML as a standard for transmitting financial information between institutions. In the next few years, you should expect to see many more standards that are based directly on XML.
Copyright 2002 by Edited by Gregory Brill