Wednesday, June 21, 2006

XML in the Wild Word environment tidbit #1...

XML markup must be well-formed. (Hey! No rabble-rousing from you veterans in the back row!!! This is for the uninitiated!)

Think of markup as boxes. You can't have a box half in and half out, you can either have a box in a box or box next to a box. This is how tags work in XML. It's technically how they're supposed to work in HTML, but the rules got shot to you know where for HTML long, long ago.

Word does some interesting things if you try to break the rules and do the half-in and half-out thing when applying XML markup to content in a document. Microsoft specifially calls this behavior "snapping". If you try to apply markup to be half in one tag and half in another, Word says "uh-uh. Can't do that" and then "snaps" your cursor to the next legal place where you can put markup; something which can confuse users until they understand why this is happening. Especially if you get snapped 3 page from where you thought you were putting the markup.

So how do you know if you've selected an entire box and aren't overlapping boxes? Word determines this by the good old paragraph break. New paragraph = new box. You can't apply markup to half of one paragraph and half of another (either preceding or following, or even somewhere else in your document). It's an "everyone in or out of the pool" deal here.

Tuesday, June 20, 2006

We Interrupt This Blog...

to recover from rocking out Sunday and Monday nights.

Sunday night was spent sweating and wondering about retinal damage from EXCELLENT center section, back row orchestra seats at the Bauhaus and Nine Inch Nails Saratoga Performing Arts Center gig.

I got to see Peter Murphy act like a bat and a vampire live!!!! (It only took 20 years to actually get there. It almost made me long for my my vintage store black trench coat and some really black eyeliner. Almost.)

Total death count: 3 guitars, one tambourine, at least one mike stand, and maybe a 12 pack of AquaFina? The local paper made a big deal about chucking stuff into the audience, but at the time, I just figured that Trent Reznor wanted to keep the "young enough to think getting squished is cool so a rock star can sweat on you crowd" hydrated.

Of course, the local paper also made a big deal that the light show only let you see the band in profile most of the time... Do I really need to say DUH!!! on this one? What? You want full stage lighting + la Streisand spot lights for proto-Goth and techno-Goth? And you were at this concert why?

Next up... Bruce Springsteen Seeger Sessions. Also at SPAC. This time from some pretty comfy camp chairs on the lawn and far less heat (Note: There's something appropriate about going to a Nine Inch Nails concert in 90 degree humidity. Must be the atmosphere.).

Total death count: Maybe a guitar string? Although with the boss's tendency toward marathon concerts, I was kind of surprised some of the horns held out as long as they did. Guess they have good lungs.

I wonder if someone sent Bruce to music school behind our backs? Great concert, but I've already attended the history of American folk music lectures... Maybe this is Bruce's way of giving the band a break between songs since he makes them play for 3 hours! The performance was flawless and the musicians were in top form.

XM... wha? What's that? OHHH. You thought this was going to be the XML in Office blog. Yeah, well, I didn't get to do much with XML in Office today. I was too busy doing my day job and rehashing Bauhaus songs in my mind. OK. And Nine Inch Nails songs... And I do have to admit that John Henry's running around in there somewhere alongside the siren spiritual from the movie O Brother Where Art Thou, too.

Hey. Even geeks gotta step out from behind the monitor some time (and, contrary to popular belief, not just to go to the bathroom). Back to the Xs, Ms, Ls, and Office 200Xs next time... when I will regale you all with the woas of valid, but incorrect Oasis table markup, and maybe a W3C standard or two.

Monday, June 19, 2006

Microsoft Tech Ed 2006 Links

Hey all -

A couple of things I wanted to make available to everyone. First of all, Microsoft has made all of the 2006 sessions available as a Webcast Series. Follow the link here to get to the listings. I went primarily to the OFC (Office) track, but I also attended a few of the development track sessions that specifically discussed VSTO 2005. For the type of work that I do, and my technical level, I got a lot more out of the Office sessions than the VSTO 2005. I didn't think the VSTO sessions had the same impact or level of detail that was available in the plain old office sessions.

The other thing you guys may want to check out are the hands-on virtual labs that were available at the conference. Microsoft is making these available to everyone, too. In my opinion, the hands-on labs opportunity was one of the best parts of the conference.

Top 10 TechEd 2006 Conclusions

  1. Hands-0n labs are very cool. I think I did every single Office 2007 lab except for the one about Excel. Those of you who know me, and how I feel about Excel and Excel development know why!
  2. Sharepoint 2007 is a user friendly and usable portal with all sorts of features that end users will love, like workflows, blogs, and wikis... OH MY!!!
  3. Do not bring your laptop bag to the first day of the conference. They just give you another one.
  4. Empty your laptop bag of all but the essentials every night to make room for more vendor stuff. Trust me on this one.
  5. The RibbonX Developer model is XML-based and way cool.
  6. Spend time talking to folks on the Microsoft Office Development teams. They're very nice people who can tell you a lot about where the products are going.
  7. XML - means to an end, or end unto itself? Microsoft seems to be more on the means to an end side of this fence. Yes, they have XML running through pretty much all of their systems as middleware - the RibbonX Developer XML file-based interface is an excellent example, but other than to continue Word's ability to work with a custom schema, they still haven't gone so far as to adopt XML as a content management way of life. They're getting closer, though.
  8. The absence of any real discussion of XSLT and the near absence of any real discussion of XPATH... Maybe this was because it was such a huge and high level conference, but no one really talked much about XSLT and why you would want to do transforms on XML. XPATH got mentioned only occasionally in terms of how to get at stuff in middleware XML files. Whas' up with that?
  9. Old dogs can learn new tricks. Once you convince them of the value of the trick. I used to think that applications were better in terms of maintenance and upgrades if you coded everything meticulously by hand. In practice, however, this also meant that your development team needed to follow a consistent methodology and approach, and that adding new people to the mix meant that you needed to bring them up to speed on your methodology and approach, convince them why your way was THE WAY, and then look over their shoulder to make sure they weren't breaking the rules... This was back in the days when I was trying to make the code do back flips and look perfect for the sake of making the code do back flips and look perfect, when I really should've been thinking about the quickest way to solve the business problem AND built a good solid maintainable and upgradeable application. Now I'm all about the quickest way between point A and B. Visual Studio generated code - whether you're using intellisense or snippets give you consistency and makes sure that you follow the rules. Microsoft does publish best practices and all sorts of other information about how you can approach your projects. So my question is, if I'm not a technology guru - that is, I don't sit on standards committees and think deep thoughts of brilliant and Escher'esque symmetical code patterns, why should I reinvent the wheel if I have a tool (Visual Studio) that does the hard stuff for me? I've long held to the motto "just because you can does not mean you should." I've only just come to really think to live by that motto in the last few years, though. This, however, is not an excuse for not understanding how to do it the hard way. You really do have to know how to do it the hard way to stay on the edge of these technologies, and to better appreciate how much stuff you can turn over to Microsoft's tools when building a Microsoft-based application.
  10. Hah! I don't really need to go out and buy a Windows Mobile smart phone after all! I went to the Cingular site and figured out how to do everything I want to do on my current phone. Truth be told, I've used the Palm OS for too long to just give it up, and I don't need to be more connected than I already am. I can't believe I'm giving up the opportunity to lust after new techno toys, but there you go.

And there you have it folks... The conclusion of XML in the wild Tech Ed 2006 environment.

But there are many other environments and habitats in which wild XML may thrive. For this reason, I've decided to continue my musings here. I can't guarantee any smoking gun revelations, but I do keep my ear to the web on lots of XML topics other than XML in Microsoft products and content management-based XML...

And with the previous brave statement, I open the virtual floor to whomever should care to request any further field studies of the wild XML beast. If I don't have an opinion (yeah. right.), or don't know a subject directly, chances are that I do know from whom, how, and where to find answers. I invite you to bring on your suggestions for further XML in the Wild adventures.

Wednesday, June 14, 2006

And now...

4 minutes to Brian Jones Open XML presentation... woohoo!

The 3:45 InfoPath session was great. InfoPath 2007 is way faster on the client side, and the stuff they've done on the server side is great. They've even got the server all integrated with Microsoft server diagnostic tool so you can track the health of your forms server - how many hits, clients, what bombed, what didn't, tracebacks, etc. Very, very cool.

I think one of the conference's biggest impacts is my realization about how very far software development has come. Edit HTML or SGML in notepad? That seems positively prehistoric compared to what Visual Studio 2005 can do....

OK. Now for Brian Jones and Office Open XML Formats - blogging live! live! live!

The files are now owned by ECMA because Office Open XML is now an ECMA standard. They're going for ISO standard next. I'm not sure why they feel the need to double-cross the "t" here, but I think it probably has something to do with playing in the same field as ODF - a conversation for another time.

Word, Excel, and Powerpoint 2007 are XML by default - and they get new extensions to go with it to prevent the nightmare that happened when people tried to upgrade from Office 95 to Office 97 (which was the last major file format change).

Goal: represent all features of binary Office files in XML - hence the level of detail in the ECMA submission - 4017 pages.

Give everyone the binary format described in XML, and you can get a wide open development door to Office documents... with the caveat that Microsoft has written a covenant "not to sue" anyone for using the formats. This leads to an entirely different argument regarding Open XML and ODF - most of which has been led by the pundits over at with some input from blogs on, and here.

OK... Away from the politics and back to the details. Zip files make stuff much smaller... So the Office team borrowed a page from the Open Office development group and decided that Open XML gets to live in a zip file with its binary buddies, and a manifest file to hold everything together. Do I really need to document the benefits of sending a zip file over the network over a binary file?

Do I really need to talk about the tendency of documents to spontaneously corrupt in previous versions of Office?


The new office formats will be backward compatible through Office 2000 via downloadable plugins that will let you read and write Office 2007 formats even if you don't have Office 2007.

XML formats from Office XP and Office 2003 will be forwards compatible. Apparently, there's not that much difference between WordML 2003 and WordML 2007 except for the fact that the 2007 version is chunked into semantic parts - document properties, application state, headers, footers, page stuff, borders, graphics, tables, endnotes, and paragraphs. The whole thing is held together with a ".rels" file, which is really an XML manifest file similar to a catalog for those of you who are familiar with the catalog file concept. All of this stuff is revealed merely by changing the .docx extension to .zip and extracting the contents. Voila. XML.

The fact that the standard has to be backward compatible AND do the binary thing is really what has inflated the standard document. Is this good or bad? The "level of detail" argument is another prevalent argument among the ODF vs. Open XML debaters. More on that later, too.

What if you've got binary stuff embedded in your files, like graphics, OLE Objects, or VBA? That stuff gets saved into the zip, too, but to indicate that the file isn't 100% pure XML, they append an "m" to the extension to let you know that there's unpure binary stuff in the .zip file, so BEWARE... Still, they definitely are separating the binary stuff from the content stuff. This makes sense to me seeing as XML for graphics can get really verbose really quickly, and require special support to view. Why bother if you can just put the binary stuff in the zip?

OK. Back to the point. The office formats have seriously evolved over time, and this is very cool. Long ago in a job far away, I remember demo'ing for a Microsoft Office Word project manager. The subject of the demo was a proof of concept Arbortext Epic-based app. I had built that helped me explain that we wanted Word to be a "real" structured editor. 5 years later, it would appear that Word can definitely be a real structured editor.

And so the door to structured editing for the masses is now open... Let the structured content run amok (as only structured content can't)!

Gotta go. He's actually showing how to build a Word 2007 document from the ground up in notepad...

Hey. I think this entry has come full circle. From notepad to eternity and back. Go figure.

The word of the day in these parts...

Is Sharepoint 2007 Server and Services. They slice, they dice, the julienne. They wiki, they blog, they direct workflow, and do other stuff, too.

The inteface is slick. The pages are point and click to customize.

omigod. Microsoft has FINALLY put out a product that can really compete with the Documentums and LiveLinks and Plum Trees (oh my!)...

More later. Gotta head to the "Designing InfoPath for Client and Browser compatibility" session.

Monday, June 12, 2006

Twofer. Or, Little XML Grrrl arrives in the Big Microsoft City...

This is going to be 2 posts in one. I didn't post last night because I had a nightmare travel day and arrived at the hotel to find that I had no power cord for my computer.

Good thing Thomson has offices in Boston! They set me up with power this afternoon, so I'm good to go.

Note: Beware of pilots who tell their passengers that we can't take off until they "Ctrl+Alt+Del" the plane. Be very wary. Needless to say, I spent quite a bit of time familiarizing myself with the C & D concourses in PIA yesterday morning and afternoon.

But I got here in time for the TechEd Keynote speech where Microsoft laid out their 4 promises to taking over the world... woops... I mean satisfying their customers... via a theme based on the Fox show, 24. Only they called it 4 (4 promises) and there were only 4 episodes, each lasting 4 minutes. The promises? The usual stuff: infrastructure, security, agility, and end user results.

In my blurred sense of reality (having started the day at 5 a.m.), I had this little vision of boxes of Microsoft apps. running agility courses (you know, jumps, weave poles, tunnels, etc - like dogs!) in this large conference center which has surely hosted at least one or more dog shows... Lack of sleep will do that to a body.

But enough of that. I'm here, I have a power cord, so let's get on with it.

This conference is the biggest conference I've attended. And those of you who know me know that I've got a few conferences under my belt. I'm definitely getting my exercise - what between all the walking from here to there, carrying varying amounts of weight throughout the day as vendors literally toss the swag at us as we walk by. Not that I'm complaining about swag, mind you. I needed new T-shirts, anyway. They're keeping us well fed and hydrated so I really have no complaints about the venue.

One guy is getting around the convention center on his segue (remember those weird gyroscope people movers that were a flopping rage a few years ago?). My guess is that this wasn't his first TechEd experience.

Takes aways from today: 8 years of usability research went into the Office 2007 UI redesign. So all that time I thought Microsoft was watching me like big brother with their "customer experience improvement" program, I could've been contributing to usability research. Hmmm.

But seriously, the new Office 2007 UI is gorgeous. The new graphics engine that runs charting in Excel, Word, and Powerpoint is gorgeous. The "live preview" of formatting configurations and fonts are gorgeous. End users will be very excited by this interface. Developers and power users will hate it because it's like your mother-in-law came in and rearranged your kitchen while you were on vacation.

Where did she put the damn coffee cups?!? Oh... Well what the hell are they doing over there?!? That's not where they go...

The other thing that structured editor proponents aren't going to like is the fact that all of the really slick stuff that end users are really going to love happen to be document formatting related. This will put the cause for structured authoring back a few years, as we will have to reconvince our users all over again why they shouldn't care about formatting over semantics. Sigh.

The other big takeaway from today's sessions was how important Sharepoint Services are to pretty much anything and everything to do with Microsoft Office workflows in this New World of Office 12. Basically, Sharepoint pulls everything together so you don't have to create a patchwork system of some stuff in this flavor, some stuff in that flavor, and a whole lot of chewing gum in between to hold stuff together. Sharepoint is supposed to let you create your workflows and integrate everything like legos - snap it all together and you're golden. At least that's what they're telling us.

I've been hearing a lot about "snap together" application integration approaches lately, both at home and abroad, so I'm not entirely surprised to be hearing this from Microsoft. Snap together applications are apparently the "new black" in application integration this year. That realization aside, they do make a pretty good argument about the fact that our business is NOT playing with Microsoft applications - it's whatever our business is - publishing, regulatory submissions, whatever. If Microsoft can provide a lego block approach that really works, we spend less time playing with software and more time concentrating on our actual business. I think they could be on to something here...

And what about the XML stuff? What about the Open XML formats? What about smart documents and XSLT? Well, it is only Monday, and they Open XML format sessions don't happen 'til tomorrow and Wednesday, but over all, Microsoft is still treating XML very much as a middleware/data type of creature - something that's really useful under the hood, but that must be hidden from end users at all costs. They are definitely coming around to seeing the advantage at being able to get at information semantically - all the cool things you can do in knowledge management, drive workflows, and rearrange things are definitely up for discussion. But .NET books at all of the publishers (Wiley has quite a presence here, BTW) far outnumber any XML offerings. Even though XML is running all through Microsoft products now (driving, for instance, the the new Office UI Ribbon, and configuring processes across virtual servers), it's still considered very much a side dish in the Microsoft world.

And that's pretty much it for today. I got to see our InfoPath vendor's 20 minute "chalk talk" about their new InfoPath-based project management product, which was very cool. I got to talk to Altova XML Spy developers about table parsing (a conversation that will most likely be continued if I have anything to say about it). There are neat things coming around the bend for future versions of VSTO that will definitely ease current security and deployment issue pains.

And with all the information coming at me (at about the same rate as the T-shirts and pens), that's enough for what was essentially Microsoft TechEd 2006 Day #1 of 5.

Friday, June 09, 2006

And so it begins...

I have been thinking about setting up a blog to document my experiences with the wild XML beast in context of document management, publishing, and Office (be it Microsoft or other). I've been doing the SGML/XML thing for 10 years this summer, so I may actually have a cogent thought or two to contribute to the conversation.

Then my boss mentioned that he wanted status reports on my experiences and what I was learning at Microsoft's TechEd 2006 conference in Boston next week and it all sort of came together... Blog... Blog... Blog...

So here it is. The first entry in my official deep thoughts about XML in the Wild blog. Stay tuned... More reporting from the wild world of XML next week!

This page is powered by Blogger. Isn't yours?