<< Y is for Y2K (and other date/time problems)

X is for XML vs. JSON

Friday 27 April 2018 07:15 CDT David Braverman

Blogging A to Z Welcome to the antepenultimate day (i.e., the 24th) of the Blogging A-to-Z challenge.

Today we'll look at how communicating between foreign systems has evolved over time, leaving us with two principal formats for information interchange: eXtensible Markup Language (XML) and JavaScript Object Notation (JSON).

Back in the day, even before I started writing software, computer systems talked to each other using specific protocols. Memory, tape (!) and other storage, and communications had significant costs per byte of data. Systems needed to squeeze out every bit in order to achieve acceptable performance and storage costs. (Just check out my computer history, and wrap your head around the 2400 bit-per-second modem that I used with my 4-megabyte 386 box, which I upgraded to 8 MB for $350 in 2018 dollars.)

So, if you wanted to talk to another system, you and the other programmers would work out a protocol that specified what each byte meant at each position. Then you'd send cryptic codes over the wire and hope the other machine understood you. Then you'd spend weeks debugging minor problems.

Fast forward to 1996, when storage and communications costs finally dropped below labor costs, and the W3C created XML. Now, instead of doing something like this:

METAR KORD 261951Z VRB06KT 10SM OVC250 18/M02 A2988

You could do something like this:

<?xml version="1.0" encoding="utf-8"?>
<weatherReport>
	<station name="Chicago O'Hare Field">KORD</station>
	<observationTime timeZone="America/Chicago" utc="2018-04-26T19:51+0000">2018-04-26 14:51</observationTime>
	<winds>
		<direction degrees="">Variable</direction>
		<speed units="Knots">6</speed>
	</winds>
	<visibility units="miles">10</visibility>
	<clouds>
		<layer units="feet" ceiling="true" condition="overcast">25000</layer>
	</clouds>
	<temperature units="Celsius">18</temperature>
	<dewpoint units="Celsius">-2</dewpoint>
	<altimeter units="inches Hg">29.88</altimeter>
</weatherReport>

The XML only takes up a few bytes (612 uncompressed, about 300 compressed), but humans can read it, and so can computers. You can even create and share an XML Schema Definition (XSD) describing what the XML document should contain. That way, both the sending and receiving systems can agree on the format, and change it as needed without a lot of reprogramming.

To display XML, you can use eXtensible Style Language (XSL), which applies CSS styles to your XML. (My Weather Now project uses this approach.)

Only a few weeks later, Douglas Crockford defined an even simpler standard: JSON. It removes the heavy structure from XML and presents data as a set of key-value pairs. Now our weather report can look like this:

{
  "weatherReport": {
    "station": {
      "name": "Chicago O'Hare Field",
      "icao code": "KORD"
    },
    "observationTime": {
      "timeZone": "America/Chicago",
      "utc": "2018-04-26T19:51+0000",
      "local": "2018-04-26 14:51 -05:00"
    },
    "winds": {
      "direction": { "text": "Variable" },
      "speed": {
        "units": "Knots",
        "value": "6"
      }
    },
    "visibility": {
      "units": "miles",
      "value": "10"
    },
    "clouds": {
      "layer": {
        "units": "feet",
        "ceiling": "true",
        "condition": "overcast",
        "value": "25000"
      }
    },
    "temperature": {
      "units": "Celsius",
      "value": "18"
    },
    "dewpoint": {
      "units": "Celsius",
      "value": "-2"
    },
    "altimeter": {
      "units": "inches Hg",
      "value": "29.88"
    }
  }
}

JSON is easier to read, and JavaScript (and JavaScript libraries like JQuery) can parse it natively. You can add or remove key-value pairs as needed, often without the receiving system complaining. There's even a JSON Schema project that promises to give you the security of XSD.

Which format should you use? It depends on how structured you need the data to be, and how easily you need to read it as a human.

Others have commented

David Harper

Friday 27 April 2018 07:24 CDT

One of my favourite examples of using every available bit of memory goes back to the IBM System/370 machines that I used in the 1980s. They had a 32-bit architecture, but memory was still expensive, so only the low 24 bits of each 32-bit word were actually used to specify addresses, which were limited to 16 MB. That was a *huge* amount of memory back then, of course, so nobody actually felt limited by it. Anyhow, the high 8 bits were ignored in address translation, and canny assembly language programmers would use this "spare" byte to store program data, so no bit went unused. Everything was fine until IBM introduced VM/XA (aka System/370 Extended Architecture), which opened up the full 32-bit address space. Suddenly, those high 8 bits became part of the address, and many hand-crafted assembler programs would no longer work. Fortunately, IBM provided a compatibility mode which allowed a program to indicate whether it used 24-bit or 32-bit addresses.

Mrs Fever

Saturday 28 April 2018 03:24 CDT

My job takes me to a variety of places, and about a year ago I found myself in a computer programming class on a college campus. I don't remember the language (except that Computer Speak is a language all its own!), but I remember something about a "whale loop" (WAHL loop?), so now when I think about code, I picture Orcas. ;) Interesting and informative -- I like your approach to the A to Z challenge! :)