|
The Anthropology of Software |
|
|
|
November 10, 2005 I See Markup, Part 1 - Yet Another Parser Is Born This is the story of my product CMarkup. To a C programmer accustomed to dealing with strings and files, the idea of using a huge parsing library or giant component to handle a little XML is like using a helicopter to make a run to the grocery store. In January 1999 I started working as a subcontractor on a large government IT project. Like many government IT projects it was meant to streamline a huge number of legacy and upgraded systems across the world. Anyway, we were looking at XML as a technology, and I discovered as other developers did that although it was a great thing it was not all it was hyped up to be. There were no robust XML products that could be deployed across an array of platforms and configurations that would enhance interoperability without requiring significant setup, configuration and changes to the way the systems were used. To me, XML is just a data format, an alternative to (and/or intermediary between) things like fixed width fields, 512 byte records, EDI and delimited formats. But I didn't get far telling anyone this. They wanted to hear about all the promise of XML that they imagined (and inside of distributed transactions too). Because the L in XML stands for "Language," people often assumed it could be used to program computers. People told me that XML could do all sorts of preposterous things, one senior manager even told me he thought XML could be used to go inside legacy systems and fix them, sort of like a good virus. The XML solutions I found or read about carried a lot of system or software requirements with them. A java parser required a certain version of the Java runtime which was a huge download. MSXML required Internet Explorer 4.0 (at the time my customer wanted to standardize on Netscape). I spent a long time on the Apache website trying to figure out where to find their free source code parser, installing things and browsing through the documentation endlessly. The literature on XML.com seemed to be talking about DTD repositories and namespaces. If XML is just a simple data format, why is there so much to all of these so-called free resources? I guess they are trying to get you to buy into more than just a simple solution; they want to take you for a ride. Say you have a small program meant to filter out some data transactions on a computer somewhere with a daily dial-up "Burst" connection to a command center. With a requirement that says the computer is generally NOT connected to a network, it is ridiculous to be told to link to DTD files kept in a remote central repository! An example of what the XML solution vendor might say is "...well, if you don't have a reliable TCP/IP connection to your centralized DTD repository ah hmmm, may be we can work around it somehow..." Suddenly you are working to make the XML solution happy; rather than it working for you. Your small program is like the size of a pamphlet and the big vendor XML solutions out there are like taking a stack of phone books and WHOMP, slamming this huge pile down on top of your little pamphlet. Who's in charge now? The beauty of XML as a data format is that for an adept programmer it is possible to generate and process simple XML with normal string functions in any language. I wrote a small C++ class called CMarkup to encapsulate our XML needs for messaging and transactions. Its main useful feature was the encoding and decoding (escaping and unescaping) of special characters like less than signs and quotes in data content and attributes. This is the single most important feature that you don't want to implement over again everywhere you roll your own XML. I am sure many many people have implemented this, and that is how another XML parser is born.
|
|