Ben Point

 

The Anthropology of Software

Ben Point RSS feed   

Southwestern Uganda
Flying over northern Arizona
Santa Barbara

Software Related

Joel on Software

JoS: General

JoS: Business

JoS: Design

The Old New Thing

Sorting It All Out

Dare Obasanjo

ongoing

firstobject news

cmarkup

xmljungle

 

Startup/µISV

MicroISV

CodeSnipers

Planet MicroISV

My Micro-ISV

Keith Casey (KC)

Gavin Bowman

Ben/BRKStudio

Paul Dix

Ian Landsman

Bobby Strickland

Neville Franks

Phil

Mike

Ben McGaughey

Serge Wautier

Bob Walsh

Carmen Ferrara

Christopher Hawkins

Nick Bradbury

Dharmesh Shah

Philipp Schumann

NGEDIT guy (J)

Gurock Brothers

 

Me at CodeSnipers.com

Glossary of Text Encoding

Splitting Surrogate Pairs

The Enigma of Encoding Versions

How I Invented Base64

EBCDIC to ASCII (and SBCS) Conversion

That Ol' OEM Code Page

Phantom Currency Signs in Japan and Korea

The Euro Sign Predicament

Oh No! Mojibake!

How to Determine Text File Encoding

The secret family split in Windows code page functions

Whether Double-Byte Is ANSI

Strange case of two system locale ANSI charsets

 

 

« I Don't Endorse Ill-Formed XML, Part 2

 | Main | 

I See Markup, Part 2 - XML Industry Prone to Lack of Productivity »

I See Markup, Part 1 - Yet Another Parser Is Born

This is the story of my product CMarkup.

To a C programmer accustomed to dealing with strings and files, the idea of using a huge parsing library or giant component to handle a little XML is like using a helicopter to make a run to the grocery store.

In January 1999 I started working as a subcontractor on a large government IT project. Like many government IT projects it was meant to streamline a huge number of legacy and upgraded systems across the world. Anyway, we were looking at XML as a technology, and I discovered as other developers did that although it was a great thing it was not all it was hyped up to be. There were no robust XML products that could be deployed across an array of platforms and configurations that would enhance interoperability without requiring significant setup, configuration and changes to the way the systems were used.

To me, XML is just a data format, an alternative to (and/or intermediary between) things like fixed width fields, 512 byte records, EDI and delimited formats. But I didn't get far telling anyone this. They wanted to hear about all the promise of XML that they imagined (and inside of distributed transactions too). Because the L in XML stands for "Language," people often assumed it could be used to program computers.

People told me that XML could do all sorts of preposterous things, one senior manager even told me he thought XML could be used to go inside legacy systems and fix them, sort of like a good virus.

The XML solutions I found or read about carried a lot of system or software requirements with them. A java parser required a certain version of the Java runtime which was a huge download. MSXML required Internet Explorer 4.0 (at the time my customer wanted to standardize on Netscape). I spent a long time on the Apache website trying to figure out where to find their free source code parser, installing things and browsing through the documentation endlessly. The literature on XML.com seemed to be talking about DTD repositories and namespaces. If XML is just a simple data format, why is there so much to all of these so-called free resources? I guess they are trying to get you to buy into more than just a simple solution; they want to take you for a ride.

Say you have a small program meant to filter out some data transactions on a computer somewhere with a daily dial-up "Burst" connection to a command center. With a requirement that says the computer is generally NOT connected to a network, it is ridiculous to be told to link to DTD files kept in a remote central repository! An example of what the XML solution vendor might say is "...well, if you don't have a reliable TCP/IP connection to your centralized DTD repository ah hmmm, may be we can work around it somehow..." Suddenly you are working to make the XML solution happy; rather than it working for you.

Your small program is like the size of a pamphlet and the big vendor XML solutions out there are like taking a stack of phone books and WHOMP, slamming this huge pile down on top of your little pamphlet. Who's in charge now?

The beauty of XML as a data format is that for an adept programmer it is possible to generate and process simple XML with normal string functions in any language. I wrote a small C++ class called CMarkup to encapsulate our XML needs for messaging and transactions. Its main useful feature was the encoding and decoding (escaping and unescaping) of special characters like less than signs and quotes in data content and attributes. This is the single most important feature that you don't want to implement over again everywhere you roll your own XML. I am sure many many people have implemented this, and that is how another XML parser is born.

Post a comment

« I Don't Endorse Ill-Formed XML, Part 2

 | Main | 

I See Markup, Part 2 - XML Industry Prone to Lack of Productivity »

 

Ben Bryant
Software Developer
Anthropologist

View Ben Bryant's profile on LinkedIn
 

The Market Software Development Paradigm

Death March in an Information Technology Project

Building the Machine That Will Build the Machine

I See Markup

Anthropologists In Software Design