Politics, Weather, Photography, and the Dog
Sunday 27 January 2013

I'm sitting at my remote office working on a conundrum: how to balance human usability against good software design.

The problem is: how can I create an Azure table partitioning scheme that uses Azure efficiently and still allows the user (me) efficiently to troubleshoot problems with the feature in question. This is a direct consequence of the issues I worked on this morning.

The feature is the component of the Weather Now parsing system that stores raw weather data from NOAA temporarily. By "temporarily" I mean, until I delete it. Keeping the raw data will allow me to figure out why problems occur and will allow the application to apply new features to old data in future.

NOAA publishes "cycle files" about every 3-6 minutes. The cycle uses a predictable sequence of 750 file names that repeats about every 4 days. The files go from file000 to file750, then back to file000. Sometimes, however, NOAA restarts the sequence at 0, skips files, or just crashes entirely, so the feature has to handle the file names as random. That said, the files have definite publication times, and generally—to an extent that Weather Now can optimize itself based on the pattern—the files contain weather data gathered within a short time before NOAA publishes the files.

You can have practically unlimited Azure tables in a storage account; I would imagine the number is close to the Int32 maximum value of 2.1 billion. Each table can have billions of partition keys as well. Searching on a combination of Azure table name and partition key takes the same length of time no matter how many tables are in the storage account or how many partition keys each table has. Under the hood, Azure manages the indexing so efficiently that network latency will be the bigger problem in all but a few edge cases.

For Weather Now, my first thought was to create a new table for each month's NOAA files and partition the table by day. So, weather parsing process would put the metadata for a file downloaded right now in the table "noaa201301" and use the partition key "20130127". That would give me about 5,700 rows in each table and about 190 rows in each partition.

I'm reconsidering. Given it's taken 11 years to change the way that Weather Now retrieves and stores weather data, using that scheme would give me 132 tables and 4,017 partitions, each of them kind of small. Azure wouldn't care, but it would over time clutter up the application's storage account. (The account will be cluttered enough as it is, with the millions of individual weather reports tabled by station and partitioned by month.)

On reflection, then, I'm going to create a new table of metadata each year, and partition by month. An Azure table with 69,000 rows (the number of NOAA files produced each year) isn't appreciably less efficient than one with 69 rows or 69 million, as it turns out. It will still partition the data as efficiently as the partition key suggests. But cutting the partitions down 30-fold could make a big difference in efficiency.

I'm open to contrary evidence. In fact, I'd love to find some. But given the frequency of data reads (one every 5 minutes or so), and the thousands of tables already in the application's storage account, I think this is the best way to go.

Sunday 27 January 2013 17:32:33 CST (UTC-06:00)  |  | Cloud | Weather | Windows Azure#
Search
On this page....
Azure table partition schemes
Countdowns
The Daily Parker +3267d 09h 40m
To London +8d 11h 21m
Parker's 9th birthday 234d 03h 33m
My next birthday 315d 07h 38m
Categories
Aviation (338) Baseball (110) Best Bars (6) Biking (44) Chicago (893) Cubs (197) Duke (132) Geography (330) Higher Ground (5) Jokes (282) Kitchen Sink (638) London (52) Parker (189) Daily (204) Photography (142) Politics (303) US (1080) World (252) Raleigh (21) Readings (8) Religion (66) San Francisco (87) Software (200) Blogs (74) Business (225) Cloud (89) Cool links (132) Security (98) Travel (202) Weather (688) Astronomy (85) Windows Azure (59) Work (54) Writing (8)
Links
Archive
<October 2014>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Full archive
Blogroll
About
David Braverman and Parker
David Braverman is a software developer in Chicago, and the creator of Weather Now. Parker is the most adorable dog on the planet, 80% of the time.
Legal
All content Copyright ©2014 David Braverman.
Creative Commons License
The Daily Parker by David Braverman is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License, excluding photographs, which may not be republished unless otherwise noted.
Admin Login
Sign In
Blog Stats
Total Posts: 4518
This Year: 416
This Month: 31
This Week: 7
Comments: 0