Friday, April 19, 2013

Five tips for converting content to DITA

Meat grinder photo (flickr: klwatts)Five tips for converting content to DITA:
So, you’ve decided to move to a DITA-based workflow. Before you convert your existing content to DITA, consider these five tips, which encompass both big-picture and coding-specific issues.

flickr: klwatts
  1. Read a book about DITA best practices before you convert. Educating yourself about what coding works well (and doesn’t work so well) in the real world can save you a lot of headaches and rework. Merely reading the DITA specification is not going to give you advice, for example, on the best way to code commands in your content.  The DITA Style Guide by Tony Self is a good resource, and I’m not saying that just because Scriptorium Press published it. That book has provided me with really useful information while working on DITA projects.
  2. Don’t assume the first sentences in a section are the short description for a DITA topic. There is a strong temptation to convert the first bit of information in a section to a topic’s short description (shortdesc element). Don’t succumb to that temptation. In my experience, it is rare that the first sentences in legacy content are a true short description, which should be a standalone summary of a topic’s content. For information on best practices for short descriptions and how different outputs (HTML, PDF, help) from the DITA Open Toolkit use shortdesc elements, see Kristen Eberlein’s Art of the short description.
  3. Use the right topic types for your content. DITA offers four topic types: generic topic, concept, task, and reference, and you should convert your content to match the purposes of those topic types. For example, don’t shoehorn a procedure better suited to a task topic into a concept topic. Yes, the DITA spec will let you code an ordered list in a concept topic that may seem like sensibly coded task. However, when it comes time for transforming your DITA content to HTML or PDF, the styling for procedures may rely on coding specific to the task element. An ordered list in a concept may not be formatted the same.
  4. Consider how cross-references are processed by the DITA Open Toolkit. During conversion, it is a good idea to add ID attributes to items that are commonly referenced (tables and figures, for example); you need those ID attributes to create cross-references to elements. However, just because the DITA spec enables you to put an ID attribute on the title element within a fig or table element, that does not mean you should point to that title element when creating cross-references. For example, in output based on the default XHTML plugin that comes with the DITA Open Toolkit,  a cross-reference to a figure will not work when the xref element points to the title element within the fig element instead of the fig element itself:screenshot showing incorrect cross-reference from xref element pointing to title in fig element
  5. Know that valid DITA content is not the same as good DITA content. Don’t be fooled when a conversion vendor makes a big deal about how quickly it can convert your legacy information into valid DITA. The problems I mentioned in tips 2–4 can exist in valid DITA topics. The validation feature in a DITA authoring tool is not going to tell you, for example, that the two sentences you converted to a short description are not a true short description.

    Valid DITA ≠ semantically correct, useful DITA.
There are many other tips I could offer, but these five are a good starting point. Feel free to share your own conversion tips and war stories below.