The Daily Parker

Politics, Weather, Photography, and the Dog

Under the hood of Weather Now

My my most recent post mentioned finishing the GetWeather component of Weather Now, my demo project that provides near-real-time aviation weather for most of the world. I thought some readers might be interested to know how it works.

The GetWeather component has three principal tasks:

In the Inner Drive Technology world, an Azure worker process uses an arbitrary collection of objects that implement the IWorkerTask interface. The interface defines Interval and LastRun properties and an Execute method, which is all the worker process needs to know. The tasks are responsible for their own lifespans, reentry prevention, etc. (That's another discussion.)

In order to decouple the data source (NOAA now, other sources in the future) from the application, I split the three tasks into two IWorkerTask classes:

  • The NoaaFileDownloadingWorkerTask opens an FTP connection to the NOAA public weather servers, retrieves the files it hasn't already retrieved, and stores the contents in Azure Blob Storage; and
  • The NoaaFileParsingWorkerTask pulls the files out of Azure Storage, parses them, and stores the results in an Azure SQL Database and Azure table storage.

I'm using Azure storage as an intermediary between the two sides of the process because my analysis led me to the conclusion that they're really independent of each other. Coupling of the two tasks in the current (2002) version of GetWeather causes all kinds of problems, not least that a failure in one task can stop the whole thing. If, as happens given the nature of the Internet, the FTP side has an unrecoverable problem, the application has to restart. In actual practice it simply kills itself and waits for the next time it runs, which can be a while because it's running on a Windows Server 2008 Scheduler job every 30 minutes.

The new architecture will allow the parser to run every minute or two, see if it has anything to do by looking at some metadata, and do its job if needed. I can change a system setting to stop it from running (for example, because I need to do some database maintenance), while letting the downloader continue to work separately.

On the other side, the downloader can run every 5 minutes, snatch the one or two files it needs from NOAA, and shut down cleanly without waiting for the parser. NOAA likes this because the connection is only open for a few seconds, instead of the 27 minutes it stays open right now. And if the NOAA server isn't available, so what? It's a clean shutdown and a clean start a few minutes later.

This design also allows me to do something else: manually upload files for parsing and storage. This helps with testing, migration, service interruptions—all things that the current architecture has made nearly impossible.

I'm not entirely done with the application (and while writing this I just thought of an improvement I'll need to make to prevent infinite retries), but it's close. And I'm really pleased with the application so far. Stay tuned; I can now set a tentative public launch date of March 31st.

Azure table partition schemes

I'm sitting at my remote office working on a conundrum: how to balance human usability against good software design.

The problem is: how can I create an Azure table partitioning scheme that uses Azure efficiently and still allows the user (me) efficiently to troubleshoot problems with the feature in question. This is a direct consequence of the issues I worked on this morning.

The feature is the component of the Weather Now parsing system that stores raw weather data from NOAA temporarily. By "temporarily" I mean, until I delete it. Keeping the raw data will allow me to figure out why problems occur and will allow the application to apply new features to old data in future.

NOAA publishes "cycle files" about every 3-6 minutes. The cycle uses a predictable sequence of 750 file names that repeats about every 4 days. The files go from file000 to file750, then back to file000. Sometimes, however, NOAA restarts the sequence at 0, skips files, or just crashes entirely, so the feature has to handle the file names as random. That said, the files have definite publication times, and generally—to an extent that Weather Now can optimize itself based on the pattern—the files contain weather data gathered within a short time before NOAA publishes the files.

You can have practically unlimited Azure tables in a storage account; I would imagine the number is close to the Int32 maximum value of 2.1 billion. Each table can have billions of partition keys as well. Searching on a combination of Azure table name and partition key takes the same length of time no matter how many tables are in the storage account or how many partition keys each table has. Under the hood, Azure manages the indexing so efficiently that network latency will be the bigger problem in all but a few edge cases.

For Weather Now, my first thought was to create a new table for each month's NOAA files and partition the table by day. So, weather parsing process would put the metadata for a file downloaded right now in the table "noaa201301" and use the partition key "20130127". That would give me about 5,700 rows in each table and about 190 rows in each partition.

I'm reconsidering. Given it's taken 11 years to change the way that Weather Now retrieves and stores weather data, using that scheme would give me 132 tables and 4,017 partitions, each of them kind of small. Azure wouldn't care, but it would over time clutter up the application's storage account. (The account will be cluttered enough as it is, with the millions of individual weather reports tabled by station and partitioned by month.)

On reflection, then, I'm going to create a new table of metadata each year, and partition by month. An Azure table with 69,000 rows (the number of NOAA files produced each year) isn't appreciably less efficient than one with 69 rows or 69 million, as it turns out. It will still partition the data as efficiently as the partition key suggests. But cutting the partitions down 30-fold could make a big difference in efficiency.

I'm open to contrary evidence. In fact, I'd love to find some. But given the frequency of data reads (one every 5 minutes or so), and the thousands of tables already in the application's storage account, I think this is the best way to go.

Race against the machine

My efforts to move Weather Now up to Microsoft Azure took on some new urgency today when I noticed this:

That particular error code means the RAID battery has less than 24 hours of charge in it. Fortunately, this means only that the disk will slow down if the battery dies, unless there's a sudden power failure, in which case I could lose the entire RAID volume.

This is exactly the sort of thing that made me want to move all my applications to the Cloud in the first place.

I just hope I can finish the port before...well, before ol' Sparky dies...

Performance improvement; or, how one line of code can change your life

I'm in the home stretch moving Weather Now to Azure. I've finished the data model, data retrieval code, integration with the existing UI, and the code that parses incoming weather data from NOAA, so now I'm working on inserting that data into the database.

To speed up development, improve the design, and generally make my life easier, I'm using Entity Framework 5.0 with database-first modeling. The problem that consumed me yesterday afternoon and on into this morning has been how to ramp up to realistic volumes of data.

The Worker Role that will go out to NOAA and put weather data where Weather Now can use it will receive somewhere around 60,000 weather reports every hour. Often, NOAA repeats reports; sometimes, NOAA sends truncated copies of reports; sometimes, NOAA sends garbled reports. The GetWeather application (soon to be Azure worker task) has to handle all of that and still function in bursts of up to 10,000 weather reports at once.

The WeatherStore class takes parsed METARs and stores them in the CurrentObservations, PastObservations, and ClimateObservations tables, as appropriate. As I've developed the class, I've written unit tests for each kind of thing it has to do: "Store single report," "Store many reports" (which tests batching them up and inserting them in smaller chunks), "Store duplicate reports," etc. Then yesterday afternoon I wrote an integration test called "Store real-life NOAA file" that took the 600 KB, 25,000-line, 6,077-METAR update NOAA published at 2013-01-01 00:00 UTC, and stuffed it in the database.

Sucker took 900 seconds—15 minutes. In real life, that would mean a complete collapse of the application, because new files come in about every 4 minutes and contain similarly thousands of lines to parse.

This morning, I attached JetBrains dotTrace to the unit test (easy to do since JetBrains ReSharper was running the test), and discovered that 90% of the method's time was spent in—wait for it—DbContext.SaveChanges(). As I dug through the line-by-line tracing, it was obvious Entity Framework was the problem.

I'll save you the steps to figure it out, except to say Stack Overflow is the best thing to happen to software development since the keyboard.

Here's the solution:

using (var db = new AppDataContext())
	db.Configuration.AutoDetectChangesEnabled = false;

// do interesting work


The result: The unit test duration went from 900 seconds to...15. And that is completely acceptable. Total time spent on this performance improvement: 1.25 hours.

Another reason to finish moving to Azure

As I've noted before, only one Web application still lives in my living room the Inner Drive Technology Worldwide Data Center: Weather Now. In the last few days, it's showing one more good reason that it needs to get to Windows Azure pronto.

Take a look at my Google Analytics view of incoming visitors:

What is going on? How do I go from 300 daily unique visitors to 1,800 in two days? Take a look at where they're coming from:

Yes, that's right. Close to 40% of Weather Now's traffic came from the Yukon Territory yesterday. And another 40% came from Alaska. And they're all going to this page for some reason. This might be why:

So how does Azure enter into it? Simply, if you have a Web application running on your own server, and you get a 750% increase in traffic, your server may not be able to handle it. Or, worse in a way, you might have been running the server capable of handling the peak load all the time, at great expense in electricity and hardware.

With Azure, you can simply bring another instance online, or increase the size of your running instance, or do any number of things to adapt quickly to the increased load, without having to buy or move the hardware. Then, when the load returns to normal, you can spin down the idle capacity. The trick is, you only pay for the capacity you're actually using.

I'm getting a lot closer to moving Weather Now, but a deadline looming at my paying job tomorrow has my attention at the moment. So more on this stuff later. Meanwhile, if you're in the Yukon or in central Alaska, stay warm, folks!

Debugging our first Azure 1.8 deployment

I've just spent three hours debugging something caused by a single missing line in a configuration file.

At 10th Magnitude, we've recently upgraded our framework and reference applications to the latest Windows Azure SDK. Since I'd already done it once, it didn't take too desperately long to create the new versions of our stuff.

However, the fact that something works in an emulator does not mean it will actually work in production. So, last night, our CTO attempted to deploy the first application we built with the new stuff out to Azure. It failed.

First, all we got was a HttpException, which is what ASP.NET MVC throws when something fails on a Razor view. The offending line was this:

   ViewBag.Title = Html.Resource("PageTitle");

This extension method indirectly calls our custom resource provider, cleverly obfuscated as SqlResourceProvider, which then looks up the string resource in a SQL database.

My first problem was to get to the actual exception being thrown. That required me to RDP into the running Web role, open a view (I chose About.cshtml because it was essentially empty), and replace the code above with this:

@using System.Globalization
    var provider = new SqlResourceProvider("/Views/Home/About.cshtml");
    var title = provider.GetObject("PageTitle", CultureInfo.CurrentUICulture);
    ViewBag.Title = title;
  catch (Exception ex)
    ViewBag.Error = ex + Environment.NewLine + "Base:" + Environment.NewLine + ex.GetBaseException();

That got me the real error stack, whose relevant lines were right at the top:

System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.WindowsAzure.ServiceRuntime, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.
File name: 'Microsoft.WindowsAzure.ServiceRuntime, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35'
at XM.UI.ResourceProviders.ResourceCache.LogDebug(String message)

Flash forward an hour of reading and testing things. I'll spare you. The solution is to add a second binding redirect in web.config:

  <assemblyIdentity name="Microsoft.WindowsAzure.ServiceRuntime" 
    publicKeyToken="31bf3856ad364e35" culture="neutral" />
  <bindingRedirect oldVersion="" newVersion="" />
  <bindingRedirect oldVersion="" newVersion="" />

Notice the second line? That tells .NET to refer all requests for the service runtime to the 1.8 version.

Also, in the Web application, you have to set the assembly references for Microsoft.WindowsAzure.Configuration and Microsoft.WindowsAzure.Storage to avoid using specific versions. In Solution Explorer, under the References folder for the web app, find the assemblies in question, view Properties, and set Specific Version to false.

I hope I have saved you three hours of your life. I will now go back to my deployment, already in progress...

Update, an hour and a half later: It turns out, there's a difference in behavior between <compilation debug="true"> and <compilation> on Azure Guest OS 3 (Windows Server 2012) that did not exist in previous guest OS versions. When an application is in debug mode on Azure Guest OS 3, it ignores some errors. Specifically, it ignores the FileNotFoundException thrown when Bundle.JavaScript().Add() has the wrong version number for the script it's trying to add. In Release mode, it just barfs up a 500 response. That is maddening—especially when you're trying to debug something else. At least it let our app log the error, eventually.

All done with the code reorg

Well, that was fun. I've just spent the last three days organizing, upgrading, and repackaging 9,400 lines of code in umpteen objects into two separate assemblies. Plus I upgraded the assemblies to all the latest cool stuff, like Azure Storage Client 2.0 and...well, stuff.

It's getting dark on the afternoon before the U.S. Thanksgiving holiday, and I'm a little fried. Goodbye, 10th Magnitude Office, until Monday.

Upgrading to Azure Storage Client 2.0

Oh, Azure Storage team, why did you break everything?

I love upgrades. I really do. So when Microsoft released the new version of the Windows Azure SDK (October 2012, v1.8) along with a full upgrade of the Storage Client (to 2.0), I found a little side project to upgrade, and went straight to the NuGet Package Manager for my prize.

I should say that part of my interest came from wanting to use some of the .NET 4.5 features, including the asynchronous helper methods, HTML 5, and native support for SQL 2012 spatial types, in the new version of Weather Now that I hope to complete before year's end. The Azure SDK 1.8 supports .NET 4.5; previous version didn’t. And the Azure SDK 1.8 includes a new version of the Azure Emulator which supports 4.5 as well.

To support the new, Azure-based version (and to support a bunch of other projects that I migrated to Azure), I have a class library of façades supporting Azure. Fortunately, this architecture encapsulated all of my Azure Storage calls. Unfortunately, the upgrade broke every other line of code in the library.

0. Many have the namespaces have changed. But of course, you use ReSharper, which makes the problem go away.

1.The CloudStorageAccount.FromConfigurationSetting() method is gone. Instead, you have to use CloudStorageAccount.Parse(). Here is a the delta from TortoiseHg:

- _cloudStorageAccount = CloudStorageAccount.FromConfigurationSetting(storageSettingName);
+ var setting = CloudConfigurationManager.GetSetting(storageSettingName);
+ _cloudStorageAccount = CloudStorageAccount.Parse(setting);

2. BlobContainer.GetBlobReference() is gone, too. Instead of getting a generic IBlobContainer reference back, you have to specify whether you want a page blob or a block blob. In this app, I only use page blobs, so the delta looks like this:

- var blob = _blobContainer.GetBlobReference(blobName);
+ var blob = _blobContainer.GetBlockBlobReference(blobName);

Note that BlobContainer also has a GetPageBlobReference() method. It also has a nearly-useless GetBlobReferenceFromServer method that throws a 404 error if the blob doesn’t exist, which makes it useless for creating new blobs.

3. Blob.DeleteIfExists() works somewhat differently, too:

- var blob = _blobContainer.GetBlobReference(blobName);
- blob.DeleteIfExists(new BlobRequestOptions 
-	{ DeleteSnapshotsOption = DeleteSnapshotsOption.IncludeSnapshots });
+ var blob = _blobContainer.GetBlockBlobReference(blobName);
+ blob.DeleteIfExists();

4. Remember downloading text directly from a blob using Blob.DownloadText()? Yeah, that’s gone too. Blobs are all about streams now:

- var blob = _blobContainer.GetBlobReference(blobName);
- return blob.DownloadText();
+ using (var stream = new MemoryStream())
+ {
+ 	var blob = _blobContainer.GetBlockBlobReference(blobName);
+ 	blob.DownloadToStream(stream);
+ 	using (var reader = new StreamReader(stream, true))
+ 	{
+ 		stream.Position = 0;
+ 		return reader.ReadToEnd();
+ 	}
+ }

5. Because blobs are all stream-based now, you can’t simply upload files to them. Here’s the correction to the disappearance of Blob.UploadFile():

- var blob = _blobContainer.GetBlobReference(blobName);
- blob.UploadByteArray(value);
+ var blob = _blobContainer.GetBlockBlobReference(blobName);
+ using (var stream = new MemoryStream(value))
+ {
+ 	blob.UploadFromStream(stream);
+ }

6. Microsoft even helpfully corrected a spelling error which, yes, broke my code:

- _blobContainer.CreateIfNotExist();
+ _blobContainer.CreateIfNotExists();

Yes, if not existS. Notice the big red S, which is something I’d like to give the Azure team after this upgrade.*

7. We’re not done, yet. They fixed a "problem" with tables, too:

  var cloudTableClient = _cloudStorageAccount.CreateCloudTableClient();
- cloudTableClient.CreateTableIfNotExist(TableName);
- var context = cloudTableClient.GetDataServiceContext();
+ var table = cloudTableClient.GetTableReference(TableName);
+ table.CreateIfNotExists();
+ var context = cloudTableClient.GetTableServiceContext();

8. Finally, if you have used the CloudStorageAccount.SetConfigurationSettingPublisher() method, that’s gone too, but you don’t need it. Instead, use the CloudConfigurationManager.GetSetting() method directly. Instead of doing this:

if (RoleEnvironment.IsAvailable)
		(configName, configSetter) => 
		(configName, configSetter) => 

You can simply do this:

var someSetting = CloudConfigurationManager.GetSetting(settingKey);

The CloudConfiguration.GetSetting() method first tries to get the setting from Azure, then from the ConfigurationManager (i.e., local settings).

I hope I have just saved you three hours of silently cursing Microsoft’s Azure Storage team.

* Apologies to Bill Cosby.

Lowest electricity usage ever

Last month I used less electricity than ever before at my current address, mainly because two of the five servers in the Inner Drive Technology Worldwide Data Center have had their duties migrated to Microsoft Windows Azure.

This past month, I used even less:

It wasn't my smallest-ever bill, though, thanks to Exelon's recent rate increases. But still: here's some more concrete evidence that the Cloud can save money.

And before people start pointing to the New York Times article from September about how wasteful the Cloud is, I can't help but point out that the writer left out the part where moving to the cloud lets businesses turn off their own on-premises servers. And given the way Azure works (and, I assume, Amazon's and Google's equivalents), instead of dedicated servers doing just about nothing all day, you have shared servers handling sometimes dozens of different virtual machines for a fraction of the cost.

Anyway, except for the part where I fly 100,000 km every year, I feel like moving to the Cloud is helping everyone.

Chicago electricity aggregation passes

Voters in the City of Chicago (including me) passed a referendum giving the city the authority to negotiate electricity prices on behalf of everyone. Implementation will be swift:

The timing of the deal is important because Chicagoans stand to save the most money over Commonwealth Edison's rate between now and June 2013, when ComEd's prices are expected to drop because pricey contracts they entered into years ago will expire. The timeline has Chicagoans moving to the new supplier in February 2013.

Michael Negron, deputy chief of policy and strategic planning for the mayor's office, said electricity suppliers have shown great interest in snagging Chicago's service. Nearly 100 people packed a conference Monday for the city's "request for qualifications" process. The bidders ranged from multi-billion corporations to smaller providers from all over the country, he said. Industry analysts say the deal could be worth hundreds of millions of dollar to the winning supplier or suppliers.

Residents and businesses may opt out of the scheme and negotiate supply prices separately. As readers of this blog know, I'm desperate for lower prices, and eagerly looking forward to my electric bills next year after the new rate deal hits right after I shut down the Inner Drive Technology Worldwide Data Center.