Preface
As I complete this book today, I have two exciting new publications fresh from the Net, still warm from the laser printer: the final texts of XML 1.0 (the World Wide Web Consortium's subset of SGML, the Extensible Markup Language) and Web SGML (the corrections and enhancements to the SGML standard, ISO 8879, for WWW uses such as XML).
These two publications together revolutionize electronic publishing; they make SGML's traditional advantages for large-scale corporate publishing available at the desktop. Most importantly, they give you, the owner of data and the creator of data systems, control of your documents: SGML and its Web-optimized subset XML represent the triumph of the Open Systems movement. Not only can we have open systems for network protocols and languages, but also for the data which animate them.
This book is a practical guide for this brave new world - ideas and declarations for SGML and XML element type sets and entity sets that implement the important and useful document structures. I have tried to put in many resources which may be hard to track down otherwise. And I have paid particular attention to trying to bring out the deeper model in XML, one which will not be familiar to readers coming from HTML or proprietary markup languages: processing instructions and notations in particular.
This is not a tutorial on syntax: there are many excellent books available, including the previous books in this series. In particular I have avoided reference to SGML declarations and to the more rarely implemented, optional features of SGML. I have included a particularly detailed treatment of one area for which there has until now been little detailed treatment: characters, glyphs and internationalization.
Order, Structures, Patterns & Forms
This book is about order, structures, patterns and forms.
In this book, order means the underlying, abstract (and sometimes ineffable) relationships and natures of things, structure means how some order is captured in some concrete markup, pattern means a kind of template or recipe used for creating structures (i.e., "pattern" in the dressmaking sense, not the text-processing sense), and form means a particular conformance between one structure and another (i.e., "form" in the concrete-laying sense, not the metaphysical sense).
Like any good cookbook, as well as showing how to make a structure, The XML & SGML Cookbook also tries to explain why, to explore the alternatives, and give the various pros and cons.
The freedom of a highly generalized technology like SGML can cause unease for new document-system designers. Being able to move in any direction is not much comfort if you cannot afford to go in the wrong direction! Fortunately, during the ten years of SGML's existence as an International Standard, convergent approaches and solutions to many common document patterns have emerged. This book attempts to catalog and discuss the best, the most instructive and the most useful of them.
Document Systems
A consideration of system constraints and factors outside the scope of the document type declaration is so often the thing that makes a project successful. Because of this, this book is aimed at the document-system designer rather than just the "DTD writer." Information is never managed in a vacuum; documents exist as part of a document system. Sometimes the system is closed, sometimes open-ended. If the information is valuable enough to warrant management, your whole document system should usefully be considered when creating a great DTD.
In any case, this book also will be of use to those who generate XML documents, and who might not ever even create formal declarations for elements types using SGML as their notation. If you are one of these people, I urge you to learn and attempt to use SGML content models in your informal documentation at least: SGML provides a very convenient and well-thought-out notation, suitable for many kinds of structures, and there are graphical visualization tools available to help.
So even though order can be discovered in all kinds of places in documents, many times the structures are loose, have exceptions or are incomplete. Consequently, the patterns for element type sets in this book are presented as prototypes and exemplars tha t you can take and reshape to your particular needs, rather than as templates which you must obediently cut and paste.
The document-system designer needs to be aware of the limits of DTD elegance. A pattern that the designer may perceive as the archetype for authors may in fact merely be a stereotype of their needs. A pattern can only be used successfully to reveal some actual order, never to impose a spurious order.
Document-system designers tend to have neat and schematic minds that reject disorder, sometimes at the price of wanting to see order where there is none: a mirage from some previous document. So this book, as well as giving patterns, also gives some principles for selecting patterns. The need for elegance must be moderated by the need for success. I hope that readers coming to this book expecting neat cookie-cutter solutions will be empowered and enabled; you will understand the issues and tradeoffs most appropriate for your individual needs.
Terminology
XML is bringing a rich influx of people from different disciplines and technologies into the SGML world, and so there is quite a variety and duplication of terminology. In order to keep sentences under control, I have used some common simplified terms which emphasize the SGML keywords used.
This Book
ISO-ese ANY element
element having a declared content type of ANY CDATA attribute
attribute having a declared value of CDATA CDATA element
element having a declared content type of CDATA CDATA entity
CDATA entity CDATA marked section
CDATA marked section container element
element having subelements EMPTY element
empty element ID attribute
attribute having a declared value of ID IDREF attribute
attribute having declared value of IDREF NDATA entity
NDATA entity NMTOKEN attribute
attribute having declared value of NMTOKEN RCDATA element
element having a declared content type of RCDATA SDATA entity
SDATA entity SUBDOC entity
subdocument entity
In this book, an attribute ID means an attribute with the name ID; an ID attribute means an attribute having the declared value ID; but the attribute ID means that attribute with the name ID in the example snippet. It is good usage that an attribute ID should be an ID attribute and that an attribute IDREF should be an IDREF attribute.
I intend to maintain a Web page giving any errata for this book, at the Prentice Hall PTR Web site phptr.
Rick Jelliffe
Sydney, Australia
From the Back Cover
Every month, the demand for SGML expertise grows-yet few people have mastered this breakthrough technology for managing information. With The XML & SGML Cookbook, you can move from SGML novice to expert faster than ever before. Based on a successful training course, this book provides dozens of instantly-usable Document Type Definition (DTD) "recipes" for virtually every type of document - and it delivers a practical understanding of document structure, patterns and form, so you can go "beyond the cookbook."
* Proven recipes for all the most common editorial structures.
* Databases, tables, forms, lists, and multiple-version documents.
* Frontmatter, metadata, formatting, and backmatter.
* Practical tips and warnings for SGML, XML, HTML, TEI, and CALS publishing.
* Detailed coverage of building documents for international use.
* All DTDs on CD-ROM - plus extensive state-of-the-art SGML tools!
Quickly learn the skills and sensitivities it's taken SGML experts years to develop.
Discover how to manage critical tradeoffs between simplicity and richness, and between immediate and future applications.
Learn to build DTDs that serve the needs of different users and different media-using techniques that are equally applicable in both SGML and XML environments.
The CD-ROM contains all the book's DTDs, plus an extensive library of great SGML tools, including EditTime SGML Editor sampler and OmniMark Light sampler.
Whether you're a publishing manager, information professional, system integrator or anyone else who needs stronger SGML expertise fast, there's no better solution than The XML & SGML Cookbook.