Under the hood of Weather Now

My my most recent post mentioned finishing the GetWeather component of Weather Now, my demo project that provides near-real-time aviation weather for most of the world. I thought some readers might be interested to know how it works.

The GetWeather component has three principal tasks:

Get the raw data from the National Aeronautics and Atmospheric Administration (or, in future, any other source);
Parse the data; and
Store the data for the web application to use.

In the Inner Drive Technology world, an Azure worker process uses an arbitrary collection of objects that implement the IWorkerTask interface. The interface defines Interval and LastRun properties and an Execute method, which is all the worker process needs to know. The tasks are responsible for their own lifespans, reentry prevention, etc. (That's another discussion.)

In order to decouple the data source (NOAA now, other sources in the future) from the application, I split the three tasks into two IWorkerTask classes:

The NoaaFileDownloadingWorkerTask opens an FTP connection to the NOAA public weather servers, retrieves the files it hasn't already retrieved, and stores the contents in Azure Blob Storage; and
The NoaaFileParsingWorkerTask pulls the files out of Azure Storage, parses them, and stores the results in an Azure SQL Database and Azure table storage.

I'm using Azure storage as an intermediary between the two sides of the process because my analysis led me to the conclusion that they're really independent of each other. Coupling of the two tasks in the current (2002) version of GetWeather causes all kinds of problems, not least that a failure in one task can stop the whole thing. If, as happens given the nature of the Internet, the FTP side has an unrecoverable problem, the application has to restart. In actual practice it simply kills itself and waits for the next time it runs, which can be a while because it's running on a Windows Server 2008 Scheduler job every 30 minutes.

The new architecture will allow the parser to run every minute or two, see if it has anything to do by looking at some metadata, and do its job if needed. I can change a system setting to stop it from running (for example, because I need to do some database maintenance), while letting the downloader continue to work separately.

On the other side, the downloader can run every 5 minutes, snatch the one or two files it needs from NOAA, and shut down cleanly without waiting for the parser. NOAA likes this because the connection is only open for a few seconds, instead of the 27 minutes it stays open right now. And if the NOAA server isn't available, so what? It's a clean shutdown and a clean start a few minutes later.

This design also allows me to do something else: manually upload files for parsing and storage. This helps with testing, migration, service interruptions—all things that the current architecture has made nearly impossible.

I'm not entirely done with the application (and while writing this I just thought of an improvement I'll need to make to prevent infinite retries), but it's close. And I'm really pleased with the application so far. Stay tuned; I can now set a tentative public launch date of March 31st.

Related posts