Posted on June 4, 2008 16:51 by swilliams

XML is kind of a neat thing. It is generally human readable and well supported by everything under the sun. But, like everything else in the programming scope, it is imperfect. The problem I have with it is that it seems like everyone is trying to shoehorn technologies into it.

Let's back up a second. XML is a Declarative Programming Language. This means that it describes a process rather than performing one. The most obvious example of this is HTML*. At a basic level, a web page is just a document, and HTML describes its structure well enough. In other words, all it does is state "Here is a heading. Here is a paragraph..." However, when a web page needs to be manipulated or have of some kind of logical flow, something else must be used. I don't know if this is an intended behavior, but it ends up working very well.

[* Technically speaking, HTML came first, and then XML was created as a superset of it.]

Most "classical" programming languages are considered Procedural (or in some circles, Imperative). Rather than describe something, they state the logical flow of execution. These languages have all the control expressions that people are familiar with: ifs, loops, functions, etc.

Each technology performs quite well in its own domain. But the road to pain and suffering begins when things get mixed up, and Procedural concepts are introduced into a Declarative syntax and vice versa.

The biggest example of this woe is an XML based rules engine that I stumbled across not too long ago. Here is how it handles a conditional statement:

<Logic>
  <If>
    <And>
        <Equals leftId="CLIENT_RATING" rightId="PREMIUM_RATING" />
        <Equals leftId="PRODUCT_TYPE" rightId="REGULAR_TYPE" />
    </And>
    <Do>
        <Integer id="DISCOUNT_PERCENT" value="5" />
    </Do>
  </If>
</Logic> 

Just looking at that makes my skin itchy. Just for contrast, here's the same thing in JavaScript:

if (ClientRating == PremiumRating && ProductType == RegularType) {
  DiscountPercent = 5;
}

A common argument for using XML is that it is easier for a lay-person to read, meaning that a programmer is not needed for maintenance. This may be true for smallish files (say 20 lines or so), but what happens when a non-programmer opens up a thousand line build script to make a substantial change? Once a file gets to be sufficiently large, the maintainer needs to have significant knowledge of the domain to make reliable changes, no matter how "easy" it is to read.

Though the bigger problem with this is that the XML here doesn't describe anything, it actually runs the process!

Build scripts are particular offenders here, MSBuild and NAnt being the bigger fish in the sea. In general I like both of these technologies; they provide incredibly useful services to programmers, but editing and maintaining them is arduous. On the surface, a Declarative language would fit in well here. After all, you are just describing a build. But when you get down and dirty, it becomes apparent that these are really procedures being run.

In both pieces of software, there is the concept of "Tasks," These tasks can clean a directory, compile code, publish it, and more. There isn't actually much of a description going on here, the XML is stating exactly what needs to be done. The individual components of a task are really just the lines of a function:

<csc target="exe" output="HelloWorld.exe" debug="false">
  <sources>
    <include name="**/*.cs" />
  </sources>
  <references>
    <include name="System.dll" />
  </references>
</csc>

Could map directly to this JavaScript:

function compile(target, output, debug) {
  var compiler = new Compiler(target, output, debug);
  compiler.sources.push('*.cs');
  compiler.references.push('System.dll');
  compiler.compile();
}

I've been using JavaScript in these examples for a reason: it lends itself well to the simple procedural programming that is described here. It's fairly lightweight, has a straightforward syntax (especially when certain conventions are forced on it), and most programmers already are familiar with it, due to the ubiquity of it on the Internet. You could have a whole scaffolding of a build script that looks like this:

// Build variables
var codeDir = 'c:\code';
var outDir = 'c:\output';

// Entry Point
function init() {
  clean();
  getFromScm();
  compile();
  test();
  deploy();  
}

// Error Handling
function onError(msg) {
}

// Tasks
function clean() {
}

function getFromScm() {
}

function compile() {
}

function test() {
}

function deploy() {
}

If JavaScript isn't your thing, there is a fascinating build utility out there called Rake. It is billed as a replacement for the old Unix and C Make utility. It does something similar to my JavaScript example, and has the power of Ruby behind it.

For programmers, and even laymen, this is readable and maybe even intuitive. Additionally, you have hosts of technologies that go with a popular procedural languages. Can you create a full battery of unit tests for your build script? Quickly? Can you easily attach them to a debugger? Perform static analysis? All of those things would be useful in a build script, but are difficult to do with XML.

Well then, what exactly is XML good for? In my experience I have found that XML is the "right" solution when you do not have to do any extra processing to handle it. Yes, most modern libraries have XmlReaders and such, but it is still a dreary task to using them to parse a large document. Perhaps the best example I can give is .NET's XmlSerialization modules. With a few lines of code, you can turn an object into an XML node, and then back again. No mucking about with Readers, XPath, or anything. And generally, the resulting XML is lightweight enough that it isn't a huge chore to modify it if needed.

The rule of thumb here is that XML is good when you do not need to put forth a huge effort to use it. Remember, things like this are supposed to be increasing our productivity, not lowering it.

I hope that I didn't come down too hard on XML here. Again, I think it is an incredibly useful tool, and there have been plenty good technologies based on it, I just think that you need to be careful not to use it (or any other "hot" commodity) in the domains that it is simply not suited for.



Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Related posts

Comments

June 4. 2008 23:45

Is XML really a programming language at all? It doesn't seem like "Declarative Programming Language" is the best term for it; it seems more like a type of data definition language. Without another piece of software to parse the data, it's useless.

A true declarative language (like Prolog) allows you to define facts (e.g. dog(Fido).), create rules (eats(X,Y) := dog(X), cat(Y).), and ask questions (eats(Fido,Y)?).

Now I guess you can get XML to describe a process, with the multitude of ways it is used these days, but I think in the way you describe its proper use (which I'd agree with), it only describes data. Wikipedia calls XSLT a declarative language, and it certainly has characteristics that would fit the bill; and the domain-relational part of SQL is definitely declarative programming.

Anyway, good read. I've been toying with MSBuild lately, and there's no way to get around it, the use of XML definitely seems improper. I'd much rather write a build script.
|

jpager

June 6. 2008 11:10

I use "programming language" rather loosely. I wouldn't really consider HTML a programming language, but XML really blurs the lines since it lets you [obtusely] write logic.
|

Scott Williams

June 7. 2008 19:59

@ Scott Williams:

Notepad lets you write logic too. ;)
|

jpager

Add comment


 

[b][/b] - [i][/i] - [u][/u]- [quote][/quote]



Live preview

November 20. 2008 19:17

|