The Daily Parker

Politics, Weather, Photography, and the Dog

Hottest day in 10 years–almost

Chicago's official temperature last hit 38°C (100°F) on 6 July 2022, almost 10 years ago. As of 4pm O'Hare reported steady at 37°C (98°F) with the likelihood of breaking the record diminishing by the minute. At Inner Drive Technology World Headquarters, we have 37.2°C, still climbing, but leveling off.

In other hotness around the world:

Finally, Florida Fish and Wildlife Officials captured a 95-kilogram, 5.4-meter Burmese python, the largest ever discovered in the state. Apparently it had recently dined on a deer. So far they have found over 15,000 of the snakes, none of them quite so large.

Update: Not that I'm complaining, but after holding just under 37°C for three hours, the temperature finally started to drop. At 6pm O'Hare reported 36°C. So no record.

Day 2 of isolation

Even though I feel like I have a moderate cold (stuffy, sneezy, and an occasional cough), I recognize that Covid-19 poses a real danger to people who haven't gotten vaccinations or who have other comorbidities. So I'm staying home today except to walk Cassie. It's 18°C and perfectly sunny, so Cassie might get a lot of walks.

Meanwhile, I have a couple of things to occupy my time:

Finally, today is the 210th anniversary of the War of 1812 and the 207th anniversary of the Battle of Waterloo.

High temperature record and other hot takes

Chicago's official temperature at O'Hare hit 35°C about two hours ago, tying the record high temperature set in 1994. Currently it's pushing 36°C with another hour of warming likely before it finally cools down overnight. After another 32°C day tomorrow, the forecast Friday looks perfect.

While we bake by the lake today, a lot has gone down elsewhere:

Finally, apparently John Scalzi and I have the same appreciation for Aimee Mann.

Extreme weather, early-summer edition

Last night we delayed the start of Terra Nostra fifteen minutes because a supercell thunderstorm decided to pass through:

The severe supercell thunderstorm that tore through Chicagoland Monday night toppled planes, ripped the roof off at least one apartment building, dropped hail as large as 1.5 inches in diameter and left tens of thousands without power in its wake.

In Cook County, 84 mph winds gusted at O’Hare International Airport. That was strong enough to turn over numerous planes at Schaumburg Regional Airport around 6:25 p.m.

Near Elk Grove Village around 6:30 p.m., roofing material started flying off an industrial building. The entire roof of a three-story apartment building was ripped off near Maywood around 6:50 p.m.

The system reached the Lake Michigan shoreline near downtown Chicago around 6:45 p.m., with “several tree branches downed just northwest of Montrose Harbor,” the weather service reported. Wind speeds of 64 knots were reported a few miles from Navy Pier and a buoy station near Calumet Harbor clocked wind speeds of 54 mph.

The weather report from O'Hare at 6:44pm gives you some indication of what we had in downtown Chicago half an hour later.

Today, the warm front that provided the energy driving that storm has already pushed temperatures over 30°C with a likely high of 36°C:

And wow, it's sticky, with dewpoints near 24°C and heat indices above 38°C. Can't wait for my commute home...

Friday, already?

Today I learned about the Zoot Suit Riots that began 79 years ago today in Los Angeles. Wow, humans suck.

In other revelations:

Finally, it's 22°C and sunny outside, which mitigates against me staying in my office much longer...

Reading Excel files in code is harmful

I've finally gotten back to working on the final series of place-data imports for Weather Now. One of the data sources comes as a 20,000-line Excel spreadsheet. Both because I wanted to learn how to read Excel files, and to make updating the Gazetteer transparent, I wrote the first draft of the import module using the DocumentFormat.OpenXml package from Microsoft.

The recommended way of reading a cell using that package looks like this:

private static string? CellText(
	WorkbookPart workbook, 
	OpenXmlElement sheet, 
	string cellId)
{
	var cell = sheet.Descendants<Cell>()
		.FirstOrDefault(c => c.CellReference == cellId);
	if (cell is null) return null;

	if (cell.DataType is null || cell.DataType != CellValues.SharedString)
	{
		return cell.InnerText;
	}

	if (!int.TryParse(cell.InnerText, out var id))
	{
		return cell.InnerText;
	}
	var sharedString = workbook.SharedStringTablePart?
		.SharedStringTable
		.Elements<SharedStringItem>()
		.ElementAt(id);
	if (sharedString?.Text is not null)
	{
		return sharedString.Text.Text;
	}
	return sharedString?.InnerText is null 
		? sharedString?.InnerXml : 
		sharedString.InnerText;
}

When I ran a dry import (meaning it only read the file and parsed it without writing the new data to Weather Now), it...dragged. A lot. It went so slowly, in fact, that I started logging the rate that it read blocks of rows:

2022-05-29 18:43:14.2294|DEBUG|Still loading at 100 (rate: 1.4/s)
2022-05-29 18:44:26.9709|DEBUG|Still loading at 200 (rate: 1.4/s)
2022-05-29 18:45:31.3087|DEBUG|Still loading at 300 (rate: 1.4/s)
...

2022-05-29 22:26:27.7797|DEBUG|Still loading at 8300 (rate: 0.6/s)
2022-05-29 22:31:01.5823|DEBUG|Still loading at 8400 (rate: 0.6/s)
2022-05-29 22:35:40.3196|DEBUG|Still loading at 8500 (rate: 0.6/s)

Yes. First, it looked like it would take 4 hours to read 20,000 rows of data, but as you can see, it got even slower as it went on.

I finally broke out the profiler, and ran a short test that parsed 14 lines of data. The profiler showed a few hot spots:

  • 355,000 calls to OpenXmlElement<T>.MoveNext
  • 740,000 calls to OpenXmlCompositeElement.get_FirstChild
  • 906,000 calls to OpenXmlChildElements<GetEnumerator>.MoveNext

That's for 14 lines of data.

So I gave up and decided to export the data file to a tab-delimited text file. This code block, which opens up the Excel workbook:

using var document = SpreadsheetDocument.Open(fileName, false);
var workbook = document.WorkbookPart;
if (workbook is null)
	throw new InvalidOperationException($"The file \"{fileName}\" was not a valid data file");

var sheet = workbook.Workbook.Descendants<Sheet>().FirstOrDefault(s => s.Name == "Airports");
if (sheet is null) throw new InvalidOperationException("Could not the data sheet");

var sheetPart = (WorksheetPart)workbook.GetPartById(sheet.Id!);
var sheetData = sheetPart.Worksheet.Elements<SheetData>().First();
var rows = sheetData.Elements<Row>().Count();

Now looks like this:

var lines = File.ReadAllLines(fileName);

And the code to read the data from an individual cell becomes:

return columns.Count >= index ? columns[index] : null;

Boom. Done. Took 30 minutes to refactor. My profiler now says the most frequent call for the 14-row test occurs just 192 times, and teh whole thing finishes in 307 ms.

So let's run it against the full file, now converted to tab-delimited text:

2022-05-30 09:19:33.6255|DEBUG|Still loading at 100 (rate: 211.3/s)
2022-05-30 09:19:33.8813|DEBUG|Still loading at 200 (rate: 274.2/s)
2022-05-30 09:19:34.1342|DEBUG|Still loading at 300 (rate: 305.4/s)
...
2022-05-30 09:20:14.9819|DEBUG|Still loading at 19600 (rate: 468.6/s)
2022-05-30 09:20:15.2609|DEBUG|Still loading at 19700 (rate: 467.8/s)
2022-05-30 09:20:15.5030|DEBUG|Still loading at 19800 (rate: 467.5/s)

Well, then. The first few hundred see a 200x improvement, and it actually gets faster, so the whole import takes 45 seconds instead of 6 hours.

So much time wasted yesterday. Just not worth it.