XML Hacking: Editing in OS X

When you’re hand-editing Office files in Windows, it’s pretty straight-forward: unzip file > edit > rezip, you’re done. Editing in OS X requires a couple of extra precautions needed. This is because the graphical user interface adds Mac attributes to files and plants hidden files in folders. Office will not tolerate either of these:

The Open XML file cannot be opened because there are problems with the contents. Details The file is corrupt and cannot be opened.

XML error message in 2008

The Open XML file cannot be opened because there are problems with the contents or the file name might contain invalid characters (for example, \/). Details The file is corrupt and cannot be opened.

XML error message in 2011

The Open XML file cannot be opened because there are problems with the contents or the file name might contain invalid characters (for example, \/). Details The file is corrupt and cannot be opened.

XML error message in 2016

If you use OS X’s Archive Utility to unzip or zip the files, Word will refuse to open the resulting file. On top of that, if you look in any of the folders using the Finder, a hidden .DS_Store file will be created in the folder. When re-zipped, Word will not accept the extra file and again report an XML error. The solution to these issues is to use the command line, like the Unix warrior you want to be! Remember to run each Terminal command by pressing the Return key after typing the command.

A valuable utility for this is OpenTerminalHere. Open any Finder window, click on OpenTerminalHere and a terminal window opens pointed to the Finder window. So download and install it, then follow these steps to open, edit and re-zip Office files:

  1. Move a copy of the Office document (let’s call it TestDoc.docx) to a separate folder and open that folder in the Finder.
  2. Click on OpenTerminalHere to open a copy of Terminal aimed at the folder.
  3. In the Terminal, type
    unzip TestDoc.docx

    then press Return. The file is unzipped into several folders plus a file called [Content_Types].xml.

  4. Do not look in any of the folders using the Finder, or you’ll have to start over. To examine a folder’s contents, use the Terminal to change the folder, then list the contents:
    cd word

    ls -l
  5. To go back up to the previous folder, type:
    cd ..
  6. To edit the files, open your text editor, then navigate using the File>Open dialog to find the file. Edit the file, then save and close.
  7. When you’re all done, double-check that terminal is pointing at the original folder holding the documents and the expanded folders. If you’re unsure, close terminal, then click on OpenTerminalHere to reopen in the right spot.
  8. In Terminal, re-zip the files with this style of command:
    zip -r RevisedDoc.dotx [Content_Types].xml _rels docProps word

    This example is for Word, but the correct syntax after zip -r is to type the name of the final document, followed by the file and folders, each separated by a space. The file is reassembled into an Office file.

  9. Test that you can open it. If you get an XML error notice, re-read the above steps and try again.

Please note: these editing techniques are required when editing in OS X with Word, PowerPoint and Excel documents and templates, plus Office Theme files (the kind exported from PowerPoint that combine all Theme elements.

If, on the other hand, you are editing a Font Theme or a Color Theme, those are simple XML files. They don’t need to be unzipped or re-zipped and Office doesn’t seem to care about OS X attributes attached to them. These plain XML files don’t need to be handled through the terminal, just use the Finder.

Next time, we’ll be looking at managing Word styles in OS X. Finally, a way to get rid of the zombie styles automatically created by Word! Happy hacking!

March 2016 Update

An (somewhat lame) alternative to working entirely in Terminal is to work on a network disk. Then you can open Terminal in your choice of folder and run the command:

defaults write com.apple.desktopservices DSDontWriteNetworkStores true

While this will prevent future generation of the .DS_Store files in that folder and any subfolders, it’s very likely you already have such files, since they’re created almost as soon as you view a folder’s contents in the Finder. So I recommend that while Terminal is open, you also run:

defaults write com.apple.finder AppleShowAllFiles YES

followed by:

killall Finder

The second line restarts the finder to force a refresh of the view. Now you can see any .DS_Store files and delete them before re-zipping the files into an Office document. You’ll have still have to do the zipping in Terminal. Also, no .DS_Store files means OpenTerminalHere doesn’t work, so you’ll have to navigate manually via Terminal commands. Now you know why this is a lame alternative.

If you try this technique, you can always restore the clean file view by running:

defaults write com.apple.finder AppleShowAllFiles NO
killall Finder

May 2016 Update

BBEdit 11 now has the ability to open and edit Office files directly, avoiding all of the above hassle. However, you still have to be a little careful about your working procedure:

  1. Open your Office file in BBEdit 11. In the left-hand pane, you’ll see a folder tree of the files contained within, so no unzipping is required
  2. Select the file you want to edit. The file opens in the main BBEdit window, displaying two lines. The first is the XML header, the second is the actual content.
  3. Click at the left end of the second line.
  4. Choose Markup>Utilities>Format… (Format with three dots after it, the plain Format command will wreck your document!)
  5. Very Important: Uncheck the first Option Normalize tag case. Leaving this option checked will cause Office to see an XML error in the file!
  6. Change Mode to Strict Hierarchical and click on Format. The XML is formatted as indented multi-line text.
  7. Make your edits and save. It’s not necessary to linearize the XML. The Office program will do that anyway the first time you save it. However, if you like to leave things exactly the way you found them, click in from of the first line of content (after the header line), choose Markup>Utilities>Format… again, change the Mode to Compact and click on the Format button. Save the file and test.

9:28 pm