The Daily Parker

Politics, Weather, Photography, and the Dog

One good thing about Texas

I did enjoy the barbecue:

That's a bit of brisket and accoutrements from Stiles Switch BBQ, and it was so good. We also got some from Black's BBQ, which might have actually been better. Of course, even if I ever go back to Austin, I'll have to try one of the other 42,167 BBQ places.

I also stopped by the Home for Developmentally-Disabled Adults and their Democratic Caretakers:

About three meters to my right, which I chose not to photograph, was a giant monument to "The Horrible Men Who Murdered for Slavery," which got mistranslated into "The Brave Men Who Died for States' Rights" by the Texas Lege when they erected it in the 1890s.

Once I got home and collected her from boarding, it took Cassie about ten minutes before she just passed out. This is a happy dog:

BBQ and Beer

I mean, when in Rome, right? My company offered four options for this afternoon. I didn't even need to read past "BBQ and Brewery Tour" to sign up. Totally worth it! I'll have more to say over the weekend when I have more time to say it, but I do like Texas BBQ, and the two beers I had were quite good.

Home tomorrow, just in time for our own heat wave. Yay.

Missed anniversary, weather app edition

I've been a little busy this weekend so even though I remembered that yesterday was the 25th anniversary of Harry Potter's publication, I forgot that Friday was the 25th anniversary of Weather Now v0. Yes, I've had a weather application on the Internet for 25 years.

The actual, standalone Weather Now application launched on 11 November 1999, which I plan to make a bigger deal of. And that comes after this blog's 25th anniversary (13 May 1998). But it's kind of cool that I have an app running continuously since the early days of Bill Clinton's second term.

Friday, already?

Today I learned about the Zoot Suit Riots that began 79 years ago today in Los Angeles. Wow, humans suck.

In other revelations:

Finally, it's 22°C and sunny outside, which mitigates against me staying in my office much longer...

Regulate crypto! And guns, too

Even though it seems the entire world has paused to honor HRH The Queen on the 70th anniversary of her accession, the world in fact kept spinning:

Blogger Moxie Marlinspike wrote about their first impressions of web3 back in January. I just got around to reading it, and you should too.

Oh, and plastic recycling doesn't work, and probably can't.

And here, a propos of nothing, is a photo of St Boniface Cemetery I took this morning:

Reading Excel files in code is harmful

I've finally gotten back to working on the final series of place-data imports for Weather Now. One of the data sources comes as a 20,000-line Excel spreadsheet. Both because I wanted to learn how to read Excel files, and to make updating the Gazetteer transparent, I wrote the first draft of the import module using the DocumentFormat.OpenXml package from Microsoft.

The recommended way of reading a cell using that package looks like this:

private static string? CellText(
	WorkbookPart workbook, 
	OpenXmlElement sheet, 
	string cellId)
{
	var cell = sheet.Descendants<Cell>()
		.FirstOrDefault(c => c.CellReference == cellId);
	if (cell is null) return null;

	if (cell.DataType is null || cell.DataType != CellValues.SharedString)
	{
		return cell.InnerText;
	}

	if (!int.TryParse(cell.InnerText, out var id))
	{
		return cell.InnerText;
	}
	var sharedString = workbook.SharedStringTablePart?
		.SharedStringTable
		.Elements<SharedStringItem>()
		.ElementAt(id);
	if (sharedString?.Text is not null)
	{
		return sharedString.Text.Text;
	}
	return sharedString?.InnerText is null 
		? sharedString?.InnerXml : 
		sharedString.InnerText;
}

When I ran a dry import (meaning it only read the file and parsed it without writing the new data to Weather Now), it...dragged. A lot. It went so slowly, in fact, that I started logging the rate that it read blocks of rows:

2022-05-29 18:43:14.2294|DEBUG|Still loading at 100 (rate: 1.4/s)
2022-05-29 18:44:26.9709|DEBUG|Still loading at 200 (rate: 1.4/s)
2022-05-29 18:45:31.3087|DEBUG|Still loading at 300 (rate: 1.4/s)
...

2022-05-29 22:26:27.7797|DEBUG|Still loading at 8300 (rate: 0.6/s)
2022-05-29 22:31:01.5823|DEBUG|Still loading at 8400 (rate: 0.6/s)
2022-05-29 22:35:40.3196|DEBUG|Still loading at 8500 (rate: 0.6/s)

Yes. First, it looked like it would take 4 hours to read 20,000 rows of data, but as you can see, it got even slower as it went on.

I finally broke out the profiler, and ran a short test that parsed 14 lines of data. The profiler showed a few hot spots:

  • 355,000 calls to OpenXmlElement<T>.MoveNext
  • 740,000 calls to OpenXmlCompositeElement.get_FirstChild
  • 906,000 calls to OpenXmlChildElements<GetEnumerator>.MoveNext

That's for 14 lines of data.

So I gave up and decided to export the data file to a tab-delimited text file. This code block, which opens up the Excel workbook:

using var document = SpreadsheetDocument.Open(fileName, false);
var workbook = document.WorkbookPart;
if (workbook is null)
	throw new InvalidOperationException($"The file \"{fileName}\" was not a valid data file");

var sheet = workbook.Workbook.Descendants<Sheet>().FirstOrDefault(s => s.Name == "Airports");
if (sheet is null) throw new InvalidOperationException("Could not the data sheet");

var sheetPart = (WorksheetPart)workbook.GetPartById(sheet.Id!);
var sheetData = sheetPart.Worksheet.Elements<SheetData>().First();
var rows = sheetData.Elements<Row>().Count();

Now looks like this:

var lines = File.ReadAllLines(fileName);

And the code to read the data from an individual cell becomes:

return columns.Count >= index ? columns[index] : null;

Boom. Done. Took 30 minutes to refactor. My profiler now says the most frequent call for the 14-row test occurs just 192 times, and teh whole thing finishes in 307 ms.

So let's run it against the full file, now converted to tab-delimited text:

2022-05-30 09:19:33.6255|DEBUG|Still loading at 100 (rate: 211.3/s)
2022-05-30 09:19:33.8813|DEBUG|Still loading at 200 (rate: 274.2/s)
2022-05-30 09:19:34.1342|DEBUG|Still loading at 300 (rate: 305.4/s)
...
2022-05-30 09:20:14.9819|DEBUG|Still loading at 19600 (rate: 468.6/s)
2022-05-30 09:20:15.2609|DEBUG|Still loading at 19700 (rate: 467.8/s)
2022-05-30 09:20:15.5030|DEBUG|Still loading at 19800 (rate: 467.5/s)

Well, then. The first few hundred see a 200x improvement, and it actually gets faster, so the whole import takes 45 seconds instead of 6 hours.

So much time wasted yesterday. Just not worth it.

Stuff I didn't have time to read today

I had to put out a new version of the Inner Drive Azure tools for my day job today, and I had more meetings than I wanted (i.e., a non-zero number), so these kind of piled up:

There were other things I'll read later, but it's past 6pm and someone is staring at me because she needs a walk.

Just one or two stories today

Sheesh:

And finally, when I left for San Francisco on Saturday morning, it was 10°C and sunny. Here we are about 76 hours later and it's 30°C. We really don't have spring or fall here some years.