The Daily Parker

Politics, Weather, Photography, and the Dog

Tick tick tick

I always find it interesting when a literary magazine takes on technology. In that spirit, the New Yorker does its best to explain the Network Time Protocol:

Today, we take global time synchronization for granted. It is critical to the Internet, and therefore to civilization. Vital systems—power grids, financial markets, telecommunications networks—rely on it to keep records and sort cause from effect. N.T.P. works in partnership with satellite systems, such as the Global Positioning System (G.P.S.), and other technologies to synchronize time on our many online devices. The time kept by precise and closely aligned atomic clocks, for instance, can be broadcast via G.P.S. to numerous receivers, including those in cell towers; those receivers can be attached to N.T.P. servers that then distribute the time across devices linked together by the Internet, almost all of which run N.T.P. (Atomic clocks can also directly feed the time to N.T.P. servers.) The protocol operates on billions of devices, coördinating the time on every continent. Society has never been more synchronized.

In N.T.P., [David] Mills built a system that allowed for endless tinkering, and he found joy in optimization. “The actual use of the time information was not of central interest,” he recalled. The fledgling Internet had few clocks to synchronize. But during the nineteen-eighties the network grew quickly, and by the nineties the widespread adoption of personal computers required the Internet to incorporate millions more devices than its first designers had envisioned. Coders created versions of N.T.P. that worked on Unix and Windows machines. Others wrote “reference implementations” of N.T.P.—open-source codebases that exemplified how the protocol should be run, and which were freely available for users to adapt. Government agencies, including the National Institute of Standards and Technology (nist) and the U.S. Naval Observatory, started distributing the time kept by their master clocks using N.T.P.

A loose community of people across the world set up their own servers to provide time through the protocol. In 2000, N.T.P. servers fielded eighteen billion time-synchronization requests from several million computers—and in the following few years, as broadband proliferated, requests to the busiest N.T.P. servers increased tenfold. The time servers had once been “well lit in the US and Europe but dark elsewhere in South America, Africa and the Pacific Rim,” Mills wrote, in a 2003 paper. “Today, the Sun never sets or even gets close to the horizon on NTP.” Programmers began to treat the protocol like an assumption—it seemed natural to them that synchronized time was dependably and easily available. Mills’s little fief was everywhere.

This being the New Yorker, one could describe the article as the author explaining how he met this programmer Mills and the politics around Mills' retirement from computing. It's better-written than the Wikipedia article, anyway.

Anthony's Song

I'm movin' out. A lovely young couple have offered to buy Inner Drive World Headquarters v5.0, and the rest of the place along with it. I've already gotten through the attorney-review period for IDTWHQ v6.0, so this means I'm now more likely than not to move house next month.

Which means I have even less time to read stuff like this:

Finally, American Airlines plans to get rid of its First Class offerings, replacing them with high-tech Business Class and more premium coach seats. I'd better use my miles soon.

Notable Friday afternoon stories

Just a few before I take a brick to my laptop for taking a damned half-hour to reformat a JSON file:

Oh, good. My laptop has finished parsing the file. (In fairness it's 400,000 lines of JSON, but still, that's only 22 megabytes uncompressed.) I will now continue with my coding.

God save our gracious King

With the death of Queen Elizabeth II, the British National Anthem has changed back to "God Save the King" for the third time in 185 years. In other news:

By the way, the UK has a vacancy for the post of Prince of Wales, in case anyone would care to apply. I think we can bet on nepotism, though.

Lunchtime links

Happy Monday:

I would now like to take a nap, but alas...

Future heat

James Fallows highlights a new US government website that maps how bad the climate will get in your town:

Let me give just a few illustrations from the first such climate-based public map the White House has released, HEAT.gov. The main points about all this and related “digital dashboards” (like the one for Covid) and maps:

  • They are customizable. You can see your immediate neighborhood, or the entire world.
  • They are configurable. You can see the “real” weather as of 2020, and the projected weather as of many decades from now.
  • They can be combined. You can overlay a map of likely future flood zones, with areas of greatest economic and social vulnerabilities.

First, a map showing the priority list of communities most at risk from heat stress some decades from now. This is based on an overlay of likely future temperatures, with current resources and vulnerabilities, and other factors and trends.

Number one on this future vulnerability list is in the Rio Grande Valley of Texas. Number ten is in Arkansas. In between, at number seven, is my own home county in California. You can tune the map to your own interests here. It is meant to serve as a guide for preparation, avoidance, and resilience.

Pretty cool stuff. At the moment, Chicago's weather seems pretty reasonable for July, but the forecast calls for hot and awful weather later this week. And that will keep happening as climate change keeps pushing more energy into the atmosphere.

Tuesday morning...uh, afternoon reading

It's a lovely day in Chicago, which I'm not enjoying as much as I could because I'm (a) in my Loop office and (b) busy as hell. So I'll have to read these later:

Finally, Mick Jagger turns 79 today, which surprised me because I thought he was closer to 130.

Missed anniversary, weather app edition

I've been a little busy this weekend so even though I remembered that yesterday was the 25th anniversary of Harry Potter's publication, I forgot that Friday was the 25th anniversary of Weather Now v0. Yes, I've had a weather application on the Internet for 25 years.

The actual, standalone Weather Now application launched on 11 November 1999, which I plan to make a bigger deal of. And that comes after this blog's 25th anniversary (13 May 1998). But it's kind of cool that I have an app running continuously since the early days of Bill Clinton's second term.

Reading Excel files in code is harmful

I've finally gotten back to working on the final series of place-data imports for Weather Now. One of the data sources comes as a 20,000-line Excel spreadsheet. Both because I wanted to learn how to read Excel files, and to make updating the Gazetteer transparent, I wrote the first draft of the import module using the DocumentFormat.OpenXml package from Microsoft.

The recommended way of reading a cell using that package looks like this:

private static string? CellText(
	WorkbookPart workbook, 
	OpenXmlElement sheet, 
	string cellId)
{
	var cell = sheet.Descendants<Cell>()
		.FirstOrDefault(c => c.CellReference == cellId);
	if (cell is null) return null;

	if (cell.DataType is null || cell.DataType != CellValues.SharedString)
	{
		return cell.InnerText;
	}

	if (!int.TryParse(cell.InnerText, out var id))
	{
		return cell.InnerText;
	}
	var sharedString = workbook.SharedStringTablePart?
		.SharedStringTable
		.Elements<SharedStringItem>()
		.ElementAt(id);
	if (sharedString?.Text is not null)
	{
		return sharedString.Text.Text;
	}
	return sharedString?.InnerText is null 
		? sharedString?.InnerXml : 
		sharedString.InnerText;
}

When I ran a dry import (meaning it only read the file and parsed it without writing the new data to Weather Now), it...dragged. A lot. It went so slowly, in fact, that I started logging the rate that it read blocks of rows:

2022-05-29 18:43:14.2294|DEBUG|Still loading at 100 (rate: 1.4/s)
2022-05-29 18:44:26.9709|DEBUG|Still loading at 200 (rate: 1.4/s)
2022-05-29 18:45:31.3087|DEBUG|Still loading at 300 (rate: 1.4/s)
...

2022-05-29 22:26:27.7797|DEBUG|Still loading at 8300 (rate: 0.6/s)
2022-05-29 22:31:01.5823|DEBUG|Still loading at 8400 (rate: 0.6/s)
2022-05-29 22:35:40.3196|DEBUG|Still loading at 8500 (rate: 0.6/s)

Yes. First, it looked like it would take 4 hours to read 20,000 rows of data, but as you can see, it got even slower as it went on.

I finally broke out the profiler, and ran a short test that parsed 14 lines of data. The profiler showed a few hot spots:

  • 355,000 calls to OpenXmlElement<T>.MoveNext
  • 740,000 calls to OpenXmlCompositeElement.get_FirstChild
  • 906,000 calls to OpenXmlChildElements<GetEnumerator>.MoveNext

That's for 14 lines of data.

So I gave up and decided to export the data file to a tab-delimited text file. This code block, which opens up the Excel workbook:

using var document = SpreadsheetDocument.Open(fileName, false);
var workbook = document.WorkbookPart;
if (workbook is null)
	throw new InvalidOperationException($"The file \"{fileName}\" was not a valid data file");

var sheet = workbook.Workbook.Descendants<Sheet>().FirstOrDefault(s => s.Name == "Airports");
if (sheet is null) throw new InvalidOperationException("Could not the data sheet");

var sheetPart = (WorksheetPart)workbook.GetPartById(sheet.Id!);
var sheetData = sheetPart.Worksheet.Elements<SheetData>().First();
var rows = sheetData.Elements<Row>().Count();

Now looks like this:

var lines = File.ReadAllLines(fileName);

And the code to read the data from an individual cell becomes:

return columns.Count >= index ? columns[index] : null;

Boom. Done. Took 30 minutes to refactor. My profiler now says the most frequent call for the 14-row test occurs just 192 times, and teh whole thing finishes in 307 ms.

So let's run it against the full file, now converted to tab-delimited text:

2022-05-30 09:19:33.6255|DEBUG|Still loading at 100 (rate: 211.3/s)
2022-05-30 09:19:33.8813|DEBUG|Still loading at 200 (rate: 274.2/s)
2022-05-30 09:19:34.1342|DEBUG|Still loading at 300 (rate: 305.4/s)
...
2022-05-30 09:20:14.9819|DEBUG|Still loading at 19600 (rate: 468.6/s)
2022-05-30 09:20:15.2609|DEBUG|Still loading at 19700 (rate: 467.8/s)
2022-05-30 09:20:15.5030|DEBUG|Still loading at 19800 (rate: 467.5/s)

Well, then. The first few hundred see a 200x improvement, and it actually gets faster, so the whole import takes 45 seconds instead of 6 hours.

So much time wasted yesterday. Just not worth it.