Tuesday, December 05, 2006

The Murky Waters of Schemas

I attended two sessions that discussed schemas this morning. The first presenter was from Microsoft, and he explained how Microsoft decided they needed to know more about how people are actually using Schemas (or if they're using Schemas?) before they went and built a tool to build schemas that people might actually use.

The study revealed recurrent reasons people are currently using schemas:

  1. Because other people are using them.
  2. Because other people are using them and we need to work with their stuff.
  3. Because the tools don't support DTDs.

They didn't ask questions about best practices. They didn't ask questions about business need. They just took the answers from a bunch of developers and DBAs and went from there. Amusingly, they summarized the response:

"Developers are saying that DBAs use schemas, and everyone else says Developers use schemas."

Uhh... Where are the business requirements in this mix? Some context please? No. They didn't ask about that. They just asked about current use.


What was established is the fact that the level of schema use will increase in the future as people stop supporting DTDs. More products will incorporate some sort of XML. Other people are doing it, so we need to be on the band wagon. Future projects will require schemas. Web services require schemas.

One thing that wasn't explicitly mentioned – Microsoft never bothered to support DTDs in the first place.

But here's the important stuff – My notes from the Q & A at the end of the session:

Q: Is there any relationship between the adoption of Schema and XQuery?

A: The presenter can't answer this question since they didn't really ask anything about XQuery.

Q: Tools for complicated schemas would be very useful for adoption – people are using tools to build SOA schemas now, which is why people who are developing more complex schemas are building schemas by hand. Will there be tools developed for people who are working with extensible and complex schemas?

A: Presenter says that Microsoft has worked with large schemas and that they want to ensure that their tools/APIs will handle large schemas effectively.

Note: Currently, it's hard to work with modular schemas – the schema include thing is not highly implemented – which means people are flattening their complex schemas just to make them work.

Note from a W3C Working Group Member: The Schema 1.1 workgroup wants to make sure they're attacking a limited set of pain points to repair things that people want – did Microsoft's study tell them anything that might help the Schema 1.1 working group address these pain points? Why are people using Schematron and RELAX NG? What is the schema group missing?

(My un-voiced answer to this question: Uhhh.... go look at what Schematron and RELAX NG do, and you'll find out what XML Schema doesn't.)

I asked if they had asked any questions about methodologies and best practices, since my own attempts to follow methodologies resulted in a schema that Word 2003 chewed up and spit out. I was told that Microsoft's study never asked people about schema methodologies and tool support for how schemas are developed (like schemas in MS Word 2003). However, they are going to try track best practices in the future. A huge barrier to completing this work is getting Web Services group to cooperate with the SQL group and the Office group on what best practices are.

Q: Would the use of UML to develop schema requirements have any effect upon, or help increase the adoption of schemas in the future?

A: Cricket cricket cricket... Uh... No. That wasn't part of the survey.

In conclusion, people use schemas, but not because they want to. A small number of people started a conversation regarding RELAX NG, Schematron, and TRANG during the session break.

The next session was "Daddy? Where do Schemas Come From" – the most I got out of that was practice safe validation. Someone even suggested digitally signed certificates to validate validation. Oh boy.

This presentation also failed to put reasons to practice safe validation into real world business requirements.

Sigh. Hopefully the desktop tools sessions this afternoon will have more relevance in terms of the real world.


Blogging via Word 2007 at XML 2006

For kicks and giggles, I figured I'd try out the blogging capability in Word 2007 for today's post.

What do y'all think?

Probably can't tell the difference, can you. If so, that's good.

Today's activities include the following sessions:

That's gotta be enough to make my head spin for one day...

Oh! And I found that there's a vendor here who actually says they can transform MS Word or XML into richly formatted, brand consistent InDesign and PDF documents. The company is http://www.typefi.com, and I'm really surprised that I haven't heard of them before now, what with all of the InDesign research I've been doing lately. I'll get back to y'all on this one as soon as I get the scoop on what it Is they actually do...

At any rate, I have just enough time to get another Mocha before this thing starts, so I'm outta here.


Monday, December 04, 2006

Paths and Queries and Transforms... Oh My!

So today's action was the XPATH 2.0, XQuery and XSLT 2.0 Explained tutorial, taught by Priscilla Walmsley.

Priscilla Walmsley is a name I've heard before. Or at least a name I've read before. I have the Definitive XML Professional Toolkit edition that she edited with Goldfarb. I also have the XML in Office 2003 book that she co-wrote with Goldfarb.

She's written quite a few books on things related to XML, and she hangs out with Goldfarb. This is a woman who knows her stuff. Forward, backward, and upside down, since she's also written books on XML Schema AND sits on W3C working groups.

That said, I think the most valuable part of the tutorial came 7 pages before the end of 114 pages of material.

What, huh? 7 hours to get to the one hour of stuff that REALLY REALLY matters?


The last hour was dedicated to the topic of deciding which technology to use - the new, and sexy cool XQuery - a language to make the hearts of SQL lovers everywhere go pitter-pat, or XSLT 2.0... the very verbose and somewhat confusing (is it declarative or functional? What do you mean it's both?!?) transformation language that is currently the workhorse of all XML workflows requiring you to actually do something with your XML after you're done with the presentation stuff and have uploaded to your repository.

Don't worry. I haven't forgotten about the XPATH part. It goes in between XQuery and XSLT. XPATH isn't quite XQuery, but it's definitely useful in XSLT, and the 2.0 version of the candidate recommendation has added a lot of stuff that was sorely missing from previous versions. Stuff that I will let you go search out for yourself, rather than go all pedantic on you.

Back to the reason why the last 7 pages were the most important...

The business decision to use XQuery or XSLT is, as are most decisions regarding which technology to use, a matter of BUSINESS REQUIREMENTS. yep. That's right. Same old, same old. It comes down to what you're trying to help your business accomplish.

XQuery looked all nice and cool and SQL-like, and more human-readable, and we spent most of the day discussing the various ins and outs of query construction. Yeah, we covered some overview stuff at the beginning, and some XPATH stuff in between, but we really spent most of our day looking at XQuery examples.

Six hours in, we switched from XQuery to XSLT 2.0, and looked at the nifty new features that make the 2.0 version a great improvement over the 1.0 version - Grouping, Sequences, Temporary Trees, MULTIPLE RESULT documents, and other stuff. Very cool - and more stuff you can go read all about at the W3C website, among other places.

OK... Back to the business requirements...

The fact of the matter is, XSLT is the language you want to use when your XML is not regular. Not predictable. Not data. Huh? Not data? Yeah. XSLT is the language you want to use when your XML is text. It was drummed into me early on in my XML life, that while all data was text, not all text was data. A very important distinction that I have passed on to pretty much every training class and presentation audience I've stood in front of.

Not all text is data.

Which means XQuery, as cool and simple as it is (besides the fact that it's all shiny new - OK - relatively new), is not the tool for processing narrative text.

Fact of the matter is, even though there's a serious overlap in capabilities between the two languages, you just aren't going to beat XSLT's capability to handle highly variable, presentation-oriented, and heavily recursive XML content. In short, don't fix what ain't broke. And don't move to the newer and cooler stuff because it's newer and cooler.

XQuery is a language for querying XML databases. It doesn't matter if that database is a fancy RDBMS, or an XML document. What does matter is that XQuery is dependent on predictably structured pieces of content that behave. You can slice and dice content from multiple sources, you can make new documents, but it's all predicated on content that has the regularity of data.

Text doesn't behave. Text isn't regular. Text can't be forced into a nice neat jello mold that you can produce over and over again.

I hadn't really compared the 2 standards before today, but now that I have, I plan to spend more time reading up on XSLT. Despite all my attempts to work in the world of data - the Access and SQL Server database training and implementations of my past - I work best with text. And while I really prefer to play with text in a graphic design environment, I do understand how structure implied by, or semantically applied to, text, makes it easier to do the graphic typography thing.

Conclusion: XQuery is very cool. I enjoyed learning about it. I just don't see a place for it in my immediate business activities when XSLT is standing behind me, beckoning enticingly with the giant Michael Kay Wrox books that I've purchased, but have yet to really study...

Then again... maybe there's a place for XQuery in the stuff that helps us process our narrative text... like maybe on the XML produced by an InfoPath form? Hmmm. I'll have to think about that.

In the meantime, dinner beckons, and I need cool my proverbial CPU for the remainder of the day.

I'm outta here.


This page is powered by Blogger. Isn't yours?