The Daily Parker

Politics, Weather, Photography, and the Dog

And hey, a bug

Not five minutes after my last post, I discovered a completely borked feature, caused by a change to the way Azure.Data.Tables executes queries.

The Daily Temperatures feature stores data in the same table as the History feature. Each row represents a weather report, where the table partition key is the weather station identifier and the row key is the date and time of the report. So, for example, the first row of data for Chicago-O'Hare in the 2023 table has a partition key of KORD and a row key of 20230101-0051.

Climate records use a row key of "Climate-" and the date. So yesterday's climate data for Chicago-O'Hare has a partition key of KORD and a row key of "Climate-20230415". Easy to remember, and easy to construct queries.

To that end, the original (.NET 6) code looked like this:

var query = 
	from entity in table.CreateQuery<ClimateRecordTableServiceEntity>()
	where entity.PartitionKey == locationId
		&& string.Compare(entity.RowKey, lowerRowKey, StringComparison.InvariantCultureIgnoreCase) >= 0
		&& string.Compare(entity.RowKey, upperRowKey, StringComparison.InvariantCultureIgnoreCase) <= 0
	select entity;

When I upgraded to .NET 7, I naïvely just changed the first line, to this:

var query = 
	from entity in table.Query<ClimateRecordTableServiceEntity>()
	where entity.PartitionKey == locationId
		&& string.Compare(entity.RowKey, lowerRowKey, StringComparison.InvariantCultureIgnoreCase) >= 0
		&& string.Compare(entity.RowKey, upperRowKey, StringComparison.InvariantCultureIgnoreCase) <= 0
	select entity;

When confronted with a 30-day query, though, it spun off into the abyss and crashed the whole app.

The correct code looks like this:

var query = table.QueryAsync<ClimateRecordTableServiceEntity>(entity =>
	entity.PartitionKey == locationId
		&& string.Compare(entity.RowKey, lowerRowKey, StringComparison.InvariantCultureIgnoreCase) >= 0
		&& string.Compare(entity.RowKey, upperRowKey, StringComparison.InvariantCultureIgnoreCase) <= 0);

See, now the filter part of the query goes inside the method call. (There's an extra step in reading the async results back, too.)

So the effect of the naïve fix was to hit the table 30 times getting back the entire partition each time. Remember that all of the weather reports go into the table? So, yeah, the 2023 table already has something like 7.5 million rows, or about 2,500 in each partition. So it tried to read 75,000 rows just to bring back 30. Oopsi.

I'm deploying the fix now.

Comments (1) -

  • David Harper

    4/16/2023 7:45:32 PM +00:00 |

    That’s far from the worst example that I have seen in my time as a DBA. I occasionally see queries that are Cartesian joins across several multi-million-row tables which would take so long to run (according to the explain plan) that the Sun would have gone cold long before the query finished.

Comments are closed