The Daily Parker

Politics, Weather, Photography, and the Dog

Whee!

It's a bit windy in Chicago: winds steady at 25 knots peaking at 47 knots at 1pm. WGN says:

The National Weather Service has issued a High Wind Warning through 7 p.m. Thursday.

Gusts are to build to greater than 60 mph at times–and there are indications a few of the strongest gusts could reach speeds of 70 to 80 mph.

Whitecaps were spotted in Lake Michigan and the gusts have the potential to send waves greater than 10 feet on the shoreline.

It’s a good idea to move objects indoors and out of the wind.

Great, thanks! That last bit helps. Even Cassie got tired of pushing against it.

Bonus graphic: can you spot when the cold front came through yesterday?

Spring, at least in some places

Canada has put the Prairie Provinces on a winter storm warning as "the worst blizzard in decades" descends upon Saskatchewan and Manitoba:

A winter storm watch is in effect for southern Manitoba and southeastern Saskatchewan, with snowfall accumulations of 30 to 50 centimetres expected mid-week, along with northerly wind gusts of up to 90 kilometres per hour, said Environment Canada on Monday.

“Do not plan to travel — this storm has the potential to be the worst blizzard in decades,” the agency warns.

The storm is expected to start Tuesday night, as a Colorado low pressure system moving toward Minnesota will bring a “heavy swath of snow” from southeastern Saskatchewan through most of southern Manitoba.

Snow will start to fall early in the evening near the U.S. border and move north overnight. Blowing snow and high winds will cause zero visibility and whiteout conditions, making driving treacherous.

Meanwhile, elsewhere:

And finally, prosecutors in Texas have declined to pursue charges against a 26-year-old woman arrested last week for infanticide after self-inducing an abortion. Welcome to the new 19th Century, at least in the religious South.

All the rows in the world

When I launched the final weather archive import on Tuesday, I predicted it would finish around 1pm today. See my accuracy for yourself:

2022-04-08 12:54:05.0975|INFO|Moved 118,773,651 weather archives from v3 to v5
2022-04-08 12:54:05.0975|INFO|Finished importing; duration 3.03:41:19.2445019
2022-04-08 12:54:05.0975|INFO|Import finished

Not a bad prediction.

So Weather Now 5 now has about 260 million historical records going back to 2006, including Chicago's weather from 15 years ago this hour. And where the weather station reported climate records, we've got those too.

Microsoft Azure recalculates storage use daily around 11 am Central time, so I don't have the complete picture yet, but it looks like I transferred about 245 GB of data. I'll find out for sure tomorrow, and in 3-4 days I'll get an accurate view of the storage cost.

Whew. I'm glad that's over.

Early afternoon roundup

Now that I've got a few weeks without travel, performances*, or work conferences, I can go back to not having enough time to read all the news that interests me. Like these stories:

Finally, Michelin has handed out its 2022 stars for Chicago. Nothing surprising on the list, but I now have four more restaurants to try.

* Except that I volunteered to help a church choir do five Messiah choruses on Easter Sunday, so I've got two extra rehearsals and a service in the next 12 days.

Bonus update: the fog this morning made St Boniface Cemetery especially spooky-looking when Cassie and I went out for her morning walk:

Archive import finished; final archive import starting today

On Friday, I used Arithmetic™ to predict that the 162-million-row weather data transfer from Weather Now v3 to v5 would end around 7pm last night. Let's check the logs:

2022-04-04 18:48:30.7196|INFO|Clearing v3 archival records for ZYTX
2022-04-04 18:49:27.7471|INFO|Moved 157,408,921 weather archives from v3 to v5
2022-04-04 18:49:27.7471|INFO|Finished importing; duration 4.04:14:55.0952715 

Nice prediction. (It logged 157 million rows because I made a performance tweak and re-started the app after 5 million.)

As I've mentioned, those 162 million rows only go back to September 2009. But v3 launched in January 2007. And it turns out I have a second archive, also containing about 25 GB of data, going back to August 2006. I had to think back to decisions I never documented to piece together why.

In August 2006, I had a single big machine that served as my domain controller, Web server, and Exchange endpoint, and a second big database server. In this photo from 31 July 2006, the Exchange/Web/DC server is the white one on the right ("DOPPELKUH") and the SQL server ("BULLE") is the big black one on the left:

Now, BULLE was a huge machine for the time, with about 200 GB of disk space and (I think) 16 GB of memory. That 200 GB expanse tempted me to turn off a data-purge feature from more parsimonious days, and apparently I deployed that change around 9am on 11 August 2006. (Sadly, the purge feature worked as designed, and I have no archival data before then.)

In October 2006, I finally bought a server rack and moved the database to an even bigger set of disks:

Everything ticked along until around the time I deployed the Weather Now v3.5 refresh and discovered that the un-purged data file had grown to 25 GB. So on 3 September 2009, I simply created a new database and changed the data connection strings to point to it. That new database kept growing until I switched the archival data store to the Cloud in 2013.

And now, I get to take advantage of the triviality of that change by making an equally trivial change to the import controller's data connection string. The 2006 data file has 118,774,028 rows covering 3,988 stations from 11 August 2006 to 3 September 2009. At 435 rows per second, the final archival import should finish around...let's see...1 pm on Friday. At that point, every single byte of data Weather Now has collected in the past 15 years will be available through the App for you to see.

More from the archives:

Quick update

The Apollo Chorus performed last night at the Big Foot Arts Festival in Walworth, Wis., so I haven't done a lot of useful things today. I did take a peek at the other weather archive I have lying around, and discovered (a) it has the same schema as the one I'm currently importing into Weather Now 5, and (b) it only goes back to August 2006.

Somewhere I have older archives that I need to find... But if not, NOAA might have some.

The update rolls on...

As of 17:16 CDT, the massive Weather Now v3 to v5 import had 115,441,906 records left to transfer. At 14:28 CDT yesterday, it was at 157,409,431, giving us a rate of ( 41,967,525 / 96,480 seconds = ) 435 records per second. A little more math gives us another 265,392 seconds to go, or 3 days, 1 hour, 43 minutes left.

So, OK then, what's the over-under on this thing finishing before 7pm Monday?

It's just finished station KCKV (Outlaw Field, Clarksville, Tenn.), with another 2,770 stations left to go. Because it's going in alphabetical order, this means it's finished all of the Pacific Islands (A stations), Northern Europe (B and E), Canada (C), Africa (D, F, G, H). There are no I or J stations (at least not on Earth). K is by far the largest swath as it encompasses all of the continental US, which has more airports than any other land mass.

Once it finishes the continental US, it'll have only 38 million left to do! Whee!

How long will this take?

It takes a while to transfer 162.4 million rows of data from a local SQL database to a remote Azure Tables collection. So far, after 4 hours and 20 minutes, I've transferred just over 4 million rows. That works out to about 260 rows per second, or 932,000 per hour. So, yes, the entire transfer will take 174 hours.

Good thing it can run in the background. Also, because it cycles through three distinct phases (disk-intensive data read, processor-intensive data transformation, and network-intensive data write), it doesn't really take a lot of effort for my computer to handle it. In fact, network writes take 75% of the cycle time for each batch of reports, because the Azure Tables API takes batches of 100 rows at a time.

Now, you might wonder why I don't just push the import code up to Azure and speed up the network writes a bit. Well, yes, I thought of that, but decided against the effort and cost. To do that, I would have to upload a SQL backup file to a SQL VM big enough to take a SQL Server instance. Any VM big enough to do that would cost about 67¢ per hour. So even if I cut the total time in half, it would still cost me $60 or so to do the transfer. That's an entire bottle of Bourbon's worth just to speed up something for a hobby project by a couple of days.

Speaking of cost, how much will all this data add to my Azure bill? Well, I estimate the entire archive of 2009-2022 data will come to about 50 gigabytes. The 2003-2009 data will probably add another 30. Azure Tables cost 6¢ per gigabyte per month for the geographically-redundant storage I use. I will leave the rest of the calculation as an exercise for the reader.

Update: I just made a minor change to the import code, and got a bit of a performance bump. We're now up to 381 rows per second, 46% faster than before, which means the upload should finish in only 114 hours or 4.7 days. All right, let's see if we're done early Monday morning after all! (It's already almost done with Canada, too!)

Strange data patterns

The data transfer from Weather Now v3 to v5 continues in the background. Before running it, I did a simple SQL query to find out how many readings each station reported between September 2009 and March 2013. The results surprised me a bit:

The v3 database recorded 162.4 million readings from 4,071 stations. Fully 75 of them only have one report, and digging in I can see that a lot of those don't have any data. Another 185 have fewer than 100, and a total of 573 have fewer than 10,000.

At the other end, Anderson AFB on Guam has 123,394 reports, Spangdahlem AB in Germany has 123,297, and Leeuwarden, Netherlands, has 119,533. In fact, seven Dutch weather stations account for 761,000 reports of the 162 million in the archive. I don't know why, but I'll find out at some point. (It looks like some of them have multiple weather recording devices with color designations. I'll do some more digging.)

How many should they have? Well, the archive contains 1,285 days of records. That's about 31,000 hourly reports or 93,000 20-minute updates—exactly where the chart plateaus. Chicago O'Hare, which reports hourly plus when the weather shifts significantly had 37,069 reports. Half Moon Bay, Calif., which just ticks along on autopilot without a human weather observer to trigger special reports, had 90,958. So the numbers check out pretty well. (The most prolific US-based station, whose 91,083 reports made the 10th most prolific in the world, was Union County Airport in Marysville, Ohio.)

Finally, I know that what the App has a lot of data sloppiness right now. After I transfer over these archives, I'll work on importing the FAA Airports database, which will fix the names and locations of most of the US stations.