It takes a while to transfer 162.4 million rows of data from a local SQL database to a remote Azure Tables collection. So far, after 4 hours and 20 minutes, I've transferred just over 4 million rows. That works out to about 260 rows per second, or 932,000 per hour. So, yes, the entire transfer will take 174 hours.
Good thing it can run in the background. Also, because it cycles through three distinct phases (disk-intensive data read, processor-intensive data transformation, and network-intensive data write), it doesn't really take a lot of effort for my computer to handle it. In fact, network writes take 75% of the cycle time for each batch of reports, because the Azure Tables API takes batches of 100 rows at a time.
Now, you might wonder why I don't just push the import code up to Azure and speed up the network writes a bit. Well, yes, I thought of that, but decided against the effort and cost. To do that, I would have to upload a SQL backup file to a SQL VM big enough to take a SQL Server instance. Any VM big enough to do that would cost about 67¢ per hour. So even if I cut the total time in half, it would still cost me $60 or so to do the transfer. That's an entire bottle of Bourbon's worth just to speed up something for a hobby project by a couple of days.
Speaking of cost, how much will all this data add to my Azure bill? Well, I estimate the entire archive of 2009-2022 data will come to about 50 gigabytes. The 2003-2009 data will probably add another 30. Azure Tables cost 6¢ per gigabyte per month for the geographically-redundant storage I use. I will leave the rest of the calculation as an exercise for the reader.
Update: I just made a minor change to the import code, and got a bit of a performance bump. We're now up to 381 rows per second, 46% faster than before, which means the upload should finish in only 114 hours or 4.7 days. All right, let's see if we're done early Monday morning after all! (It's already almost done with Canada, too!)
The data transfer from Weather Now v3 to v5 continues in the background. Before running it, I did a simple SQL query to find out how many readings each station reported between September 2009 and March 2013. The results surprised me a bit:
The v3 database recorded 162.4 million readings from 4,071 stations. Fully 75 of them only have one report, and digging in I can see that a lot of those don't have any data. Another 185 have fewer than 100, and a total of 573 have fewer than 10,000.
At the other end, Anderson AFB on Guam has 123,394 reports, Spangdahlem AB in Germany has 123,297, and Leeuwarden, Netherlands, has 119,533. In fact, seven Dutch weather stations account for 761,000 reports of the 162 million in the archive. I don't know why, but I'll find out at some point. (It looks like some of them have multiple weather recording devices with color designations. I'll do some more digging.)
How many should they have? Well, the archive contains 1,285 days of records. That's about 31,000 hourly reports or 93,000 20-minute updates—exactly where the chart plateaus. Chicago O'Hare, which reports hourly plus when the weather shifts significantly had 37,069 reports. Half Moon Bay, Calif., which just ticks along on autopilot without a human weather observer to trigger special reports, had 90,958. So the numbers check out pretty well. (The most prolific US-based station, whose 91,083 reports made the 10th most prolific in the world, was Union County Airport in Marysville, Ohio.)
Finally, I know that what the App has a lot of data sloppiness right now. After I transfer over these archives, I'll work on importing the FAA Airports database, which will fix the names and locations of most of the US stations.
Sunday night I finished moving all the Weather Now v4 data to v5. The v4 archives went back to March 2013, but the UI made that difficult to discover. I've also started moving v3 data, which would bring the archives back to September 2009. I think once I get that done then moving the v2 data (back to early 2003) will be as simple as connecting the 2009 import to the 2003 database. Then, someday, I'll import data from other sources, like NCEI (formerly NCDC) and the Met*, to really flesh out the archives.
One of the coolest parts of this is that you can get to every single archival report through a simple URL. For example, to see the weather in Chicago five years ago, simply go to https://wx-now.com/History/KORD/2017/03/30. From there, you can drill into each individual report (like the one from 6pm) or use the navigation buttons at the bottom to browse the data.
Meanwhile, work continues apace on importing geographic data. And I have discovered a couple of UI bugs, including a memory leak that caused the app to crash twice since launch. Oops.
* The Met has really cool archives, some of which go back to the 1850s.
I've just switched the DNS entries for wx-now.com over to the v5 App, and I've turned off the v4 App and worker role. It'll take some time to transfer over the 360 GB of archival data, and to upload the 9 million rows of Gazetteer data, however. I've set up a virtual machine in my Azure subscription specifically to do that.
This has been quite a lift. Check out the About... page for the whole history of the application. And watch this space over the next few months for more information about how the app works, and what development choices I made (and why).
Just for posterity, here's what the v4 Current Weather page looked like:
Good night, v4. You had a good 8-year run. And good night, Katie Zoellner's lovely design, which debuted 15 years ago.
This weekend, I built the Production assets for Weather Now v5, which means that the production app exists. I haven't switched over the domain name yet, for reasons I will explain. But I've created the Production Deploy pipeline in Azure DevOps and it has pushed all of the bits up to the Production workloads.
Everything works, but a couple of features don't work perfectly. Specifically, the Search feature will happily find everything in the database, but right now, the database only has about 31,000 places. Also, I haven't moved any of the archival data over from v4, so the Production app only has data back to yesterday. (The Dev/Test app has data back to last May, so for about the next week it'll have more utility than Production.)
I'm going to kick the tires on the Production app for a week or so before turning off v4. I expect my Azure bill will be about $200 higher than usual this month, but my June bill should be about what my January bill is. Uploading all the archival data and the Gazetteer will cost many, many database units, and I'll keep v4 running for probably half of April, for example.
The Dev/Test version and the Production version have the same bits as of this post. Going forward, Dev/Test will get all the new features probably a week or two ahead of Prod, just like in the real world.
Plus, over the next few months I'll post explanations of how and why I did everything in v5.
I'm pretty excited. Everything from here out is incremental, so every deployment from here out should be very boring.
I just finished upgrading an old, old, old Windows service to .NET 6 and a completely different back end. It took 6.4 hours, soup to nuts, and now the .NET 6 service is happily communicating with Azure and the old .NET Framework 4.6 service is off.
Meanwhile, the Post published a map (using a pretty lazy algorithm) describing county-by-county what sunrise times will look like in January 2024 if daylight saving time becomes permanent. I'd have actually used a curve tool but, hey, the jagged edges look much more "data-driven." (They used the center point of each county.)
Now it's 22:45 (daylight saving time), and I need to empty Cassie and go to bed. But I'm pretty jazzed by how I spent a rainy afternoon on PTO. It was definitely more rewarding than tramping out in the rain to a couple of breweries for the Brews & Choos project, which had been Plan A.
We had two incredible performances of Bach's Johannespassion this weekend. (Update: we got a great review!) It's a notoriously difficult work that Bach wrote for his small, amateur church chorus in Leipzig the year he started working there. I can only imagine what rehearsals were like in 1724. I'm also grateful that we didn't include the traditional 90-minute sermon between the 39-minute first part and the 70-minute second part, and that we didn't conclude the work with the equally-traditional pogrom against the Jews of Leipzig.
It's still a magnificent work of music.
Meanwhile, elsewhere in the world:
Finally, Rachel Feltman lists five myths about Daylight Saving Time. Our annual tradition of questioning it without changing anything will continue, of course.
And it's about 16°C outside, so it's time to take Cassie on her third half-hour walk of the day.
I surprised a colleague by suggesting that it won't get as cold as it did yesterday for the rest of 2022. The temperature bottomed out at -12°C around 6am (with a wind chill of -21°C), a record low. Plus, the climate normal low only goes below freezing until the 20th.
The upshot? I will now take Cassie on a 20-minute walk and enjoy the above-freezing temperatures as long as they last, which is currently forecast through...October?
Even as the East Coast gets bombed by an early-spring cyclone, we have sunny skies and bitter cold. But the -12°C at O'Hare at 6am will likely be the coldest temperature we get in Chicago until 2023. The forecast predicts temperatures above 10°C tomorrow and up to 16°C on Wednesday, with no more below-freezing temperatures predicted as far out as predictions can go.
Meanwhile, I'm about to leave for our first of two Bach Jonannespassion performances this weekend. We still have tickets available for tomorrow's, so come on down!
No, I didn't send Cassie's slobber to 78AndWoof.com or anything. Someone brought a DNA-shaped toy to the dog park today, which Cassie found irresistible:
This cattle dog also found the toy irresistible, leading to this irresistible tug-of-war that ranged around the park for a good five minutes: