The Daily Parker

Politics, Weather, Photography, and the Dog

Things I found while listening to a conference call

Tomorrow afternoon is the Day of the Doctor already, and then in a little more than four days I'm off to faraway lands. Meanwhile, I'm watching a performance test that we'll repeat on Monday after we release a software upgrade.

So while riveted to this Live Meeting session, I am pointedly not reading these articles:

Perhaps more to the point, I'm not finishing up the release that will obviate the very performance test I'm watching right now. That is another story.

The first head rolls

The person most directly responsible for the HealthCare.gov debacle is "retiring:"

The chief information officer at the Centers for Medicare and Medicaid Services, whose office supervised creation of the troubled federal website for health insurance, is retiring, the Obama administration said Wednesday.

The official, Tony Trenkle, will step down on Nov. 15 “to take a position in the private sector,” said an email message circulated among agency employees.

As the agency’s top information officer, Mr. Trenkle supervised the spending of $2 billion a year on information technology products and services, including development of HealthCare.gov, the website for the new health insurance marketplace.

Sibelius, though. Why's she still around?

The usability of HealthCare.gov

Jakob Nielsen's company has written a detailed analysis of how the Federal Health Exchange screwed up usability:

The HealthCare.gov team has suffered what most web professionals fear most: launching a broken web application. This is particularly harrowing given the visibility of the website in question. The serious technical and data issues have been covered extensively in the media, so we won’t rehash those. Instead, in this article we focus on how to improve the account setup process. This is a user experience issue, but fixing it will also alleviate the site's capacity problems.

Account Set-up Usability is Mission Critical

Account setup is users’ first taste of a service. A suboptimal account setup can spawn 3 problems:

  • Increased service cost: When people can’t self-service online and you have no competitors, they call you. Call-center interaction is more expensive than web self-service. In 2008, Forrester estimated call-center calls to cost $5.50 per call versus 10 cents for a user who self-services online.
  • Increased cognitive strain: The instructions for creating usernames and password in this flow (which we address further along in this article) require a great deal of concentration, and if users don’t understand the instructions, they will need to keep creating usernames and passwords until they are accepted.
  • Halo Effect: Account setup is the first in a series of web-based interactions that users will need to conduct on HealthCare.gov. A poor experience with this first step will impact how people feel not only about subsequent interactions with the site, but how they feel about the service in general and the Affordable Care Act as a whole.

The discussion around our office hinges on two things other than usability: first, give us $2 million (of the $400 million they actually spent) and we'll build a much better site. Second, the biggest problems come from the insurance companies on the back end. Users don't care about that; they just want to get health insurance. As Krugman says, though, there really wasn't a way to get the insurance companies out of the equation, and that, more than anything, is the foundation of all these other problems.

jQuery: Party like it's 1989

Programming languages have come a long way since I banged out my first BASIC "Hello, World" in 1977. We have great compilers, wonderful editors, and strong typing.

In the past few years, jQuery and JSON, both based on JavaScript, have become ubiquitous. I use them all the time now.

jQuery and JSON are weakly-typed and late-bound. The practical effect of these characteristics is that you can introduce subtle, maddening bugs merely by changing the letter case of a single variable (e.g., from "ID" to "Id").

I've just spent three hours of a perfectly decent Sunday trying to find exactly that problem in some client code. And I want to punch someone.

Two things from this:

1. Use conventions consistently. I'm going to go through all the code we have and make sure that ID is always ID, not Id or id.

2. When debugging JSON, search backwards. I'll have more to say about that later, but my day would have involved much more walking Parker had I gone from the error symptom backwards to the code rather than trying to step through the code into the error.

OK, walkies now.

healthcare.gov is bad, but the Times should know better

I am agog at a bald impossibility in the New York Times' article today about the ACA exchange:

According to one specialist, the Web site contains about 500 million lines of software code. By comparison, a large bank’s computer system is typically about one-fifth that size.

There were three reporters in the byline, they have the entire Times infrastructure at their disposal, and still they have an unattributed "expert" opinion that the healthcare.gov codebase is 33 times larger than Linux. 500 MLOC? Why not just say "500 gazillion?" It's a total Dr. Evil moment.

Put in other terms: it's like someone describing a large construction project—a 20-story office building, say—as having 500 million rivets in it. A moment's thought would tell you that the mass of 500 million rivets would approach the steel output of South Korea for last month.

The second sentence is nonsense also. "A large bank's computer system?" Large banks have thousands of computer systems; which one did you mean? Back to my example: it's like comparing the 500-million-rivet office building to "a large bank's headquarters."

I wouldn't be so out of my head about this if it weren't the Times. But if they can't get this right, what hope does any non-technical person have of understanding the problem?

One last thing. We, the people of the United States, paid for this software. HHS needs to disclose the source code of this monster. Maybe if they open-sourced the thing, they could fix it faster.

Small world

The Chicago technology scene is tight. I just had a meeting with a guy I worked with from 2003-2004. Back then, we were both consultants on a project with a local financial services company. Today he's CTO of the company that bought it—so, really, the same company. Apparently, they're still using software I wrote back then, too.

I love when these things happen.

This guy was also witness to my biggest-ever screw-up. (By "biggest" I mean "costliest.") I won't go into details, except to say that whenever I write a SQL delete statement today, I do this first:

-- DELETE
SELECT *
FROM MissionCriticalDataWorthMillionsOfDollars
WHERE ID = 12345

That way, I get to see exactly what rows will be deleted before committing to the delete. Also, even if I accidentally hit <F5> before verifying the WHERE clause, all it will do is select more rows than I expect.

You can fill in the rest of the story on your own.

Unbelievably stupid Windows thing

Fortunately, I'm in an airport with lots of power outlets. Because my laptop just warned me that it was down to its last few milliamps, even though ordinarily the 90 W/h battery I lug around can last about 8 hours. What happened? Windows Search decided that consuming 50% of my CPU (i.e., two entire cores) was a good idea while running on battery.

So since I have an hour before boarding, and since I'm now plugged in (which means I don't have any worries about driving my portable HDD), here is a lovely picture of Montréal from earlier today:

Git is not Mercurial

I'm pulling the public repository for Orchard again, because I made a mistake with Git that I can't seem to undo. I've set up my environment to have a copy of the public repository, and then a working repository cloned from it. This allows me to try things out on my own machine, in private branches, while still pulling the public bits without the need to merge them into my working copy.

Orchard, which will soon (I hope) replace dasBlog as this blog's platform, recently switched from Mercurial to Git, to which led to this problem.

I may simply not have grasped all the nuances of Git. Git is extremely powerful, in the sense that it will do almost anything you tell it to do, without regard for the consequences. It reflects the ethos of the C++ programming language, which gave everyday programmers ways to screw up previously only available to experts.

My specific screw-up was that I accidentally attempted to push my local changes back to my copy of the Public repository. I had added about six changesets, which I couldn't extract from my copy of public no matter what I tried.

So, while writing this, I just pulled a clean copy of public, checked out the two branches I wanted (1.1 and fw45, for those keeping score at home), and merged with my existing changes.

Now I get to debug that mess...and I may toss it and start over.

Border cases

Just a quick note about debugging. I just spent about 30 minutes tracking down a bug that caused a client to get invoiced for -18 hours of premium time and 1.12 days of regular time.

The basic problem is that an appointment can begin and end at any time, but from 6pm to 8am, an appointment costs more per hour than during business hours. This particular appointment started at 5pm and went until midnight, which should be 6 hours of premium and 1 hour of regular.

The bottom line: I had unit tests, which automatically tested a variety of start and end times across all time zones (to ensure that local time always prevailed over UTC), including:

  • Starts before close, finishes after close before midnight
  • Starts before close, finishes after midnight before opening
  • Starts before close, finishes after next opening
  • Starts after close, finishes before midnight
  • Starts after close, finishes after midnight before opening
  • Starts after close, finishes after next opening
  • ...

Notice that I never tested what happened when the appointment ended at midnight.

The fix was a single equals sign, as in:

- if (localEnd > midnight & local <= localOpenAtEnd)
+ if (localEnd >= midnight & local <= localOpenAtEnd)

Nicely done, Braverman. Nicely done.

When the Azure emulator is more forgiving than real life

Last night I made the mistake of testing a deployment to Azure right before going to bed. Everything had worked beautifully in development, I'd fixed all the bugs, and I had a virgin Windows Azure affinity group complete with a pre-populated test database ready for the Weather Now worker role's first trip up to the Big Time.

The first complete and total failure of the worker role I should have predicted. Just as I do in the brick-and-mortar development world, I create low-privilege SQL accounts for applications to use. So immediately I had a bunch of SQL exceptions that I resolved with a few GRANT EXEC commands. No big deal.

Once I restarted the worker role, it connected to the database, loaded its settings, downloaded a file from NOAA and...crashed:

Inner Drive Weather threw System.Data.Services.Client.DataServiceRequestException
...
OutOfRangeInput

One of the request inputs is out of range.
RequestId:572bcfee-9e0b-4a02-9163-1c6163798d60
Time:2013-02-10T06:05:41.5664525Z

at System.Data.Services.Client.DataServiceContext.SaveResult.d__1e.MoveNext()

Oh no. The dreaded Azure Storage exception that tells you absolutely nothing.

Flash forward fifteen minutes (now past midnight; and for context, I'm writing this on the 9am flight to Los Angeles), with Fiddler running on a local instance connecting to production Azure storage, and I found the XML block on which real Azure Storage barfed but the Azure storage emulator passed without a second thought. The offending table entity is metadata that the NOAA downloader worker task stores to let the weather parsing worker task know it has work to do:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
   <entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" 
   xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" 
   xmlns="http://www.w3.org/2005/Atom">
  <title />
  <author>
    <name />
  </author>
  <updated>2013-02-10T05:55:49.3316301Z</updated>
  <id />
  <content type="application/xml">
    <m:properties>
      <d:BlobName>20130209-0535-sn.0034.txt</d:BlobName>
      <d:FileName>sn.0034.txt</d:FileName>
      <d:FileTime m:type="Edm.DateTime">2013-02-09T05:35:00Z</d:FileTime>
      <d:IsParsed m:type="Edm.Boolean">false</d:IsParsed>
      <d:ParseTime m:type="Edm.DateTime">0001-01-01T00:00:00</d:ParseTime>
      <d:PartitionKey>201302</d:PartitionKey>
      <d:RetrieveTime m:type="Edm.DateTime">2013-02-10T05:55:29.1084794Z</d:RetrieveTime>
      <d:RowKey>20130209-0535-41d536ff-2e70-4564-84bd-7559a0a71d4d</d:RowKey>
      <d:Size m:type="Edm.Int32">68202</d:Size>
      <d:Timestamp m:type="Edm.DateTime">0001-01-01T00:00:00</d:Timestamp>
    </m:properties>
  </content>
</entry>

Notice that the ParseTime and Timestamp values are equal to System.DateTimeOffset.MinValue, which, it turns out, is not a legal Azure table value. Wow, would it have helped me if the emulator horked on those values during development.

The fix was simply to make sure that neither System.DateTimeOffset.MinValue nor System.DateTime.MinValue ever got into an outbound table entity, which took me about five minutes to implement. Also, it turned out that even though my table entity inherited from TableServiceEntity, I still had to set the Timestamp property when using real Azure storage. (The emulator sets it for you.)

By this point it was 12:30 and I needed to get some sleep, however. So my plan to run an overnight test will have to wait until this evening at my hotel. Then I'll find the other bits of code that work fine against the emulator but, for reasons that pass understanding, the emulator gets completely wrong.