Tuesday, January 27, 2009

How should you format text?

So here's a question: suppose you work for a publisher or are a publisher yourself. What format should you start with? I hear you saying "Hermagoras, what a silly question: whatever form the author gives me." And it's true that authors will likely give you files in a word processing format like Microsoft Word. Some technical writers, mathematicians, engineers, and computer scientists will submit files in formats like LaTeX or perhaps even HTML (the code for most web pages). But almost all other authors will give you a Word file or, perhaps, a Rich Text Format (RTF) file that can be translated among various word processors.

Over the last twenty years, editors and layout people have mainly worked with desktop publishing programs such as Adobe PageMaker and FrameMaker, QuarkXPress, and even Microsoft Publisher (for small companies publishing newsletters and the like) Such programs took care of layout issues and, combined with graphic design programs for illustration, formed the backbone of publishing technology (at least in terms of editing and page design). When a publisher needed a freelance editor, the adverisement would almost invariably request expertise in FrameMaker or QuarkXPress.

But technology is evolving, and the need to produce electronic and print products (a kind dual-format imperative) changes the technological landscape. Here's what I mean. In a dual-format environment, an editor needs to make decisions with a single text that can be translated quickly and easily into a variety of formats. A format decision should make global changes in the text, and those changes should take effect in all the chosen formats -- and in new ones that are not yet designed.

I want to take a cue from the technical publisher O'Reilly (they produce some of the best computer books around) and suggest that would-be editors and publishers Start with XML. I think this might be a good idea even though I'm not an XML expert myself. My one real experience with XML was a freelance project in which I used the XML language DocBook. The project was a manual for a computer application. The manual ended up existing in three formats: as a physical book, as a set of web pages, and as a Help file (the kind that pops up as a window in a computer program). With DocBook, I coded the text semantically -- that is, I marked the text elements for their purpose -- rather than textually. So, for example, I wouldn't mark a word as "bold" or "italic," the way you'd do it in Word or in Blogger. Instead I'd mark it as a certain kind of word: a keyword, for example, or a cross-reference. This is signficant because if you make a text italic, it's italic in every format: book, web page, help file, Kindle, etc. But if I mark it as a keyword, then the style sheet recognizes the keyword and changes all the keywords in the format according to a specified style sheet. For various reasons, you might do this differently in different formats (perhaps keywords are in italics in a print book but bolded or even linked to a glossary in a help file). With XML, you can do this.

The point is this: by coding for content, XML allows editors to mark a text so that it can be changed in different formats more cleanly and easily. In a sense, it separates the practices of editing and layout. As multiple-format publication becomes the norm, this will be more and more important, and XML skills will be something to seek out and develop.

No comments:

Post a Comment