Wednesday, June 14, 2006
And now...
4 minutes to Brian Jones Open XML presentation... woohoo!
The 3:45 InfoPath session was great. InfoPath 2007 is way faster on the client side, and the stuff they've done on the server side is great. They've even got the server all integrated with Microsoft server diagnostic tool so you can track the health of your forms server - how many hits, clients, what bombed, what didn't, tracebacks, etc. Very, very cool.
I think one of the conference's biggest impacts is my realization about how very far software development has come. Edit HTML or SGML in notepad? That seems positively prehistoric compared to what Visual Studio 2005 can do....
OK. Now for Brian Jones and Office Open XML Formats - blogging live! live! live!
The files are now owned by ECMA because Office Open XML is now an ECMA standard. They're going for ISO standard next. I'm not sure why they feel the need to double-cross the "t" here, but I think it probably has something to do with playing in the same field as ODF - a conversation for another time.
Word, Excel, and Powerpoint 2007 are XML by default - and they get new extensions to go with it to prevent the nightmare that happened when people tried to upgrade from Office 95 to Office 97 (which was the last major file format change).
Goal: represent all features of binary Office files in XML - hence the level of detail in the ECMA submission - 4017 pages.
Give everyone the binary format described in XML, and you can get a wide open development door to Office documents... with the caveat that Microsoft has written a covenant "not to sue" anyone for using the formats. This leads to an entirely different argument regarding Open XML and ODF - most of which has been led by the pundits over at slashdot.org with some input from blogs on XML.com, and here.
OK... Away from the politics and back to the details. Zip files make stuff much smaller... So the Office team borrowed a page from the Open Office development group and decided that Open XML gets to live in a zip file with its binary buddies, and a manifest file to hold everything together. Do I really need to document the benefits of sending a zip file over the network over a binary file?
Do I really need to talk about the tendency of documents to spontaneously corrupt in previous versions of Office?
Nah.
The new office formats will be backward compatible through Office 2000 via downloadable plugins that will let you read and write Office 2007 formats even if you don't have Office 2007.
XML formats from Office XP and Office 2003 will be forwards compatible. Apparently, there's not that much difference between WordML 2003 and WordML 2007 except for the fact that the 2007 version is chunked into semantic parts - document properties, application state, headers, footers, page stuff, borders, graphics, tables, endnotes, and paragraphs. The whole thing is held together with a ".rels" file, which is really an XML manifest file similar to a catalog for those of you who are familiar with the catalog file concept. All of this stuff is revealed merely by changing the .docx extension to .zip and extracting the contents. Voila. XML.
The fact that the standard has to be backward compatible AND do the binary thing is really what has inflated the standard document. Is this good or bad? The "level of detail" argument is another prevalent argument among the ODF vs. Open XML debaters. More on that later, too.
What if you've got binary stuff embedded in your files, like graphics, OLE Objects, or VBA? That stuff gets saved into the zip, too, but to indicate that the file isn't 100% pure XML, they append an "m" to the extension to let you know that there's unpure binary stuff in the .zip file, so BEWARE... Still, they definitely are separating the binary stuff from the content stuff. This makes sense to me seeing as XML for graphics can get really verbose really quickly, and require special support to view. Why bother if you can just put the binary stuff in the zip?
OK. Back to the point. The office formats have seriously evolved over time, and this is very cool. Long ago in a job far away, I remember demo'ing for a Microsoft Office Word project manager. The subject of the demo was a proof of concept Arbortext Epic-based app. I had built that helped me explain that we wanted Word to be a "real" structured editor. 5 years later, it would appear that Word can definitely be a real structured editor.
And so the door to structured editing for the masses is now open... Let the structured content run amok (as only structured content can't)!
Gotta go. He's actually showing how to build a Word 2007 document from the ground up in notepad...
Hey. I think this entry has come full circle. From notepad to eternity and back. Go figure.
The 3:45 InfoPath session was great. InfoPath 2007 is way faster on the client side, and the stuff they've done on the server side is great. They've even got the server all integrated with Microsoft server diagnostic tool so you can track the health of your forms server - how many hits, clients, what bombed, what didn't, tracebacks, etc. Very, very cool.
I think one of the conference's biggest impacts is my realization about how very far software development has come. Edit HTML or SGML in notepad? That seems positively prehistoric compared to what Visual Studio 2005 can do....
OK. Now for Brian Jones and Office Open XML Formats - blogging live! live! live!
The files are now owned by ECMA because Office Open XML is now an ECMA standard. They're going for ISO standard next. I'm not sure why they feel the need to double-cross the "t" here, but I think it probably has something to do with playing in the same field as ODF - a conversation for another time.
Word, Excel, and Powerpoint 2007 are XML by default - and they get new extensions to go with it to prevent the nightmare that happened when people tried to upgrade from Office 95 to Office 97 (which was the last major file format change).
Goal: represent all features of binary Office files in XML - hence the level of detail in the ECMA submission - 4017 pages.
Give everyone the binary format described in XML, and you can get a wide open development door to Office documents... with the caveat that Microsoft has written a covenant "not to sue" anyone for using the formats. This leads to an entirely different argument regarding Open XML and ODF - most of which has been led by the pundits over at slashdot.org with some input from blogs on XML.com, and here.
OK... Away from the politics and back to the details. Zip files make stuff much smaller... So the Office team borrowed a page from the Open Office development group and decided that Open XML gets to live in a zip file with its binary buddies, and a manifest file to hold everything together. Do I really need to document the benefits of sending a zip file over the network over a binary file?
Do I really need to talk about the tendency of documents to spontaneously corrupt in previous versions of Office?
Nah.
The new office formats will be backward compatible through Office 2000 via downloadable plugins that will let you read and write Office 2007 formats even if you don't have Office 2007.
XML formats from Office XP and Office 2003 will be forwards compatible. Apparently, there's not that much difference between WordML 2003 and WordML 2007 except for the fact that the 2007 version is chunked into semantic parts - document properties, application state, headers, footers, page stuff, borders, graphics, tables, endnotes, and paragraphs. The whole thing is held together with a ".rels" file, which is really an XML manifest file similar to a catalog for those of you who are familiar with the catalog file concept. All of this stuff is revealed merely by changing the .docx extension to .zip and extracting the contents. Voila. XML.
The fact that the standard has to be backward compatible AND do the binary thing is really what has inflated the standard document. Is this good or bad? The "level of detail" argument is another prevalent argument among the ODF vs. Open XML debaters. More on that later, too.
What if you've got binary stuff embedded in your files, like graphics, OLE Objects, or VBA? That stuff gets saved into the zip, too, but to indicate that the file isn't 100% pure XML, they append an "m" to the extension to let you know that there's unpure binary stuff in the .zip file, so BEWARE... Still, they definitely are separating the binary stuff from the content stuff. This makes sense to me seeing as XML for graphics can get really verbose really quickly, and require special support to view. Why bother if you can just put the binary stuff in the zip?
OK. Back to the point. The office formats have seriously evolved over time, and this is very cool. Long ago in a job far away, I remember demo'ing for a Microsoft Office Word project manager. The subject of the demo was a proof of concept Arbortext Epic-based app. I had built that helped me explain that we wanted Word to be a "real" structured editor. 5 years later, it would appear that Word can definitely be a real structured editor.
And so the door to structured editing for the masses is now open... Let the structured content run amok (as only structured content can't)!
Gotta go. He's actually showing how to build a Word 2007 document from the ground up in notepad...
Hey. I think this entry has come full circle. From notepad to eternity and back. Go figure.
