OOXML Hacking: An Introduction

With the introduction of Office 2007, Microsoft changed the basic file format that underlies Word, PowerPoint and Excel. Instead of the proprietary and mostly undocumented format that ruled from Office 97 to Office 2003, Microsoft made a smart decision and switched to XML. This is tagged text, similar in structure and concept to HTML code with which you may already be familiar.

XML opens up a world of possibilities for automated document construction, but that's a topic for another day. The everyday relevance for you and I is that if a Word or PowerPoint file isn't doing what you need it to do and there are no tools in the program for the job, we can now dive in a edit the file ourselves. If you're a point-and-click user, this is probably not thrilling. But if you're a hacker at heart, a midnight coder or just a curious tinkerer, you can do some cool stuff.

The main tool you're going to need is a text editor. While you can get away for a while with Notepad or TextEdit, those simple text editors don't quite have the tools that get the job done efficiently. On Mac, I use BBEdit and on Windows I reach for Notepad++. BBEdit is reasonably-priced shareware and Notepad++ is freeware. They have a similar style of operation, so if you're a cross-platform hacker it's easy to switch between them. Notepad++ uses a plugin system, so you can add tools. For this job, you're definitely going to want the free XML plugin.

The macOS requires somewhat more care with handling expanded Office files, or they won't open after being rezipped. Please see this article for the best procedure on a Mac. The rest of this article mentions Windows methods, but the XML file structure is the same on both platforms.

Word, Excel and PowerPoint files in the new format are actually simple Zip files with a different file ending. Getting into them couldn't be easier: if you're using Windows, add .zip to the end of the file (a copy of the file, if it's anything important). You'll get a warning from your OS, but you know what you're doing! Now unzip it. Out pop several folders of XML, plus a top-level file or two.

Inside a Word File

Inside a simple Word file. The document text is stored in document.xml.

Select one of the files and open it in your text editor. All the files have been linearized to minimize file size. This is where your XML tools come into play. In Notepad++, choose Plugins>XML Tools>Pretty Print (XML Only - with line breaks). Now you have a nicely indented, easy-to-read page to edit. When you're done, it's not necessary to re-linearize. Word, PowerPoint or Excel will do that for you later.

For people using Window's built-in zip utility, there is an easy mistake to watch out for. By default, unzipping a file in Windows creates a new folder named for the file being expanded. If, when you're re-assembling the file, you include this top-level folder, PowerPoint will raise an error about unreadable content in the presentation. To avoid this, first open the folder that Windows created. Select the _rels, docProps and ppt folders, plus the [Content_Types].xml file, then create a zip file from them.

As an alternative to unzipping/rezipping files in Windows, download the free 7-Zip utility. After installing, set your text editor as the 7-Zip editor. Then right-click on the Office file you want to edit and choose 7-Zip>Open Archive. A window opens showing the OOXML folders and files. Find the file you want to edit, right-click and choose Edit. Edit only 1 file at a time in 7-Zip, closing your text editor and updating the file each time. Otherwise, some or your changes may be lost.

XML hacking is useful for Excel or Word when you want to add additional color themes or when you need to rescue a corrupt document. But it really shines with PowerPoint, allowing you to create custom table formats, extra custom colors that don't fit into a theme, setting the default text size for tables and charts and much more. This technique separates the PowerPoint pros from the wannabes.

In my next post, I'll get into the specifics of some cool XML hacking Office tricks. In the mean time, check out text editors and XML tools so you're ready to hack!

Inside a PowerPoint File

A plain vanilla PowerPoint file: more complex than Word.

If code editing isn't your thing, we can do it for you! Email me at production@brandwares.com.

8:17 pm

6 thoughts on “OOXML Hacking: An Introduction

  1. Hello
    I tried a few changes, looks good, only for some xlsm I see bin files and a few one can not be exported (unzipped) , after rezipping the content I have a repair dialogbox and after repairing all seems to be ok.
    Is there a way to force the unzipping of the primary BIN files ? Or do you know why such files can't be unzipped ? May be protected ?
    Thanks

    • A .bin file is not a zip archive, it's a binary file. It's most likely a VBA macro, that's the most common use of .bin files in Office XML. VBA can be edited using the program interface after you make the Developer tab visible.

  2. I used the OOXML chrome plug in to edit the theme.xml file on my Mac. I downloaded the file and when I opened it in powerpoint, I got an error "PowerPoint found a problem with content in FS Powerpoint template_2018.potx. PowerPoint can attempt to repair the presentation."

    When I repair the presentation, all of the formatting is stripped out. Thoughts?

    • When you edited the XML, you introduced an error. It can be small, like an omitted quotation mark, or large, PowerPoint gives you the same error message. It's small help, but the section that was removed was the part that had the mistake.

      As mentioned in the article, because of Office's uninformative feedback, it's best to make one small change at a time, downloading and testing the file as you proceed. Once you gain experience and have a library of tested XML, it gets faster.

  3. I'm trying to accomplish something that seems should be simple, but isn't, to me. I want to remove the Style Gallery from the Word for Mac 16.24 tool bar. Is there a relatively straightforward way to do that?

    • Removing built-in controls is not only not simple, it's not possible. The best you can do is hide the entire Styles group, but then you have no access to styles at all, because the Style Pane opener gets hidden as well. Word pros depend on styles to create consistent, professional documents and templates. I can recognize a Word newbie by the lack of style use in their documents. You might want to re-evaluate your urge to hide this very important control.

      I'm writing an article about hacking Ribbon XML, but until that comes out, you might consider pruning the styles displayed in the Styles Gallery to just those that you use. Here's my article showing how to do that: XML Hacking: Managing Styles

      For an easier way to manage styles, please check out AuthorText Manage Styles, a free add-in from MVP Rich Michaels. It works both on macOS and Windows, a rarity in the world of Office add-ins.

Leave a Reply

*Required fields. Your email address will not be published.

Posting XML? To enter XML code, please replace all less than signs "<" with "&lt;" and greater than signs ">" with "&gt;". Otherwise, Wordpress will strip them out and you will see only a blank area where your code would have appeared.