Writing Enterprise Software: Error Checking
“There is no substitute for hard work.” -Thomas A. Edison
Recap: The need for error checking
In my previous article (http://www.riversand.com/blog/writing-enterprise-software/), I introduced the topic of error handling in enterprise software. More accurately, I introduced the topic of error checking. Error handling includes checking for errors and catching exceptions, but it also involves doing something about an error when it happens.
I will set aside the “what to do when an error occurs” topic for now. In this article I will recap why error checking is important, present a common use case where error checking is needed, and then show some code that performs thorough (and thoroughly ugly) error checking. In the next article, I will present a design pattern for making the error checking much cleaner.
In short, the main reasons to thoroughly check for errors (and catch exceptions that can be thrown) are to write code that behaves predictably, and to accurately and usefully report errors. If you are writing code that does not check every return code for an error, then you run the risk that your code fails later in some way that is seemingly unrelated to the original failure. You also lose critical information about the error, information that might help the user of your software solve the problem and avoid a support call.
Many software developers implement the main functionality of their code, thinking they will go back and add error checking/reporting/handling, but time pressures often get in the way of such follow up activities.
Performing thorough error checking in your code is hard work. It takes discipline. But it’s well worth the effort.
Aside: A case for useful error messages
I mentioned above that errors should be reported both accurately and usefully. The distinction is important. How many times have you performed some operation on your computer, such as copying a file or updating your personal information on a web site, only to receive an error message along the lines of “Unknown error” or “Error 0xC0081004 occurred”? Both of those may be accurate error messages, but they are far from useful.
Writing an accurate and useful error message (such as “Unable to copy file. Destination is full.” or “Unable to update user information because the account is locked. Please contact customer support.”) requires effort. But think how happy your users will be when they understand why an error happened. They’ll be ecstatic if they have enough information to correct the error themselves.
Case study: Getting a configuration value from JSON
I have made a general, sort of hand-wavy argument for the importance of error checking. Let’s take a look at an example. Suppose we want to store our software’s configuration data in a JSON file . Suppose also that we’ve organized the JSON into a collection of nested objects, so that the stored configuration data reflects the organization of modules and their UI elements in our software.
For example, if we want to store the configuration value for the layout of UI element 1 in module A, we might have a JSON object that looks like the following snippet:
A common approach to accessing this information in Java would be to load it into a hierarchy of Java objects whose classes model the structure of the JSON object. This is easily accomplished using the (very handy) Gson library (https://en.wikipedia.org/wiki/Gson). If you’re not familiar with Gson, that’s okay. All you really need to know is that we can write a set of Java classes that allows us to load the JSON object above into a collection of related Java objects and access the layout for element 1 using code that looks something like:
JsonObject layout = config.getModule(“moduleA”).getElement(“element1”).layout;
During development of the software, this works just fine. We control the contents of the JSON configuration object, which we store in a file as part of our software’s configuration. Everything works fine during testing, too. All is good.
Then we ship our software and much happiness ensues. It’s finally time to take that much delayed, and desperately needed, vacation. Margaritaville beckons.
Support call #1
Until we get our first support call, and the customer is irritated because she gets a message about a “NullPointerException” when she tries to run our software. It ran just fine yesterday. This morning, she started our software, just like she does every morning, but now it’s giving her this cryptic message and she can’t do her job. All is no longer good.
Thoughts of Margaritaville are wastin’ away. 
Analysis: The customer did what?
After several long hours on the phone with the irate customer, we finally
determine that she had inadvertently changed the JSON configuration file while poking around in our installation directory. If this seems far-fetched, you’ve never supported an enterprise software product.
Even though the error is due to the customer’s action, we don’t win any points for selling her software that gives her a completely useless (albeit accurate) error message.
We can do better. So we roll up our sleeves and get back to work. We decide that we need to add error checking to the code that retrieves the layout information for element 1.
It turns out that the customer accidently added a space to the name of the “modules” JSON object, so when we parsed the JSON file, we didn’t find a “modules” object, and we don’t know what to do with an object named “module s”. The end result is that our code that accesses the settings for “moduleA” accesses a non-existent modules object. That is, the code Config.getModule(“moduleA”) returns a null pointer, so the code Config.modules(“moduleA”).getElement(“element1”) generates a NullPointerException.
Fix #1: Checking for missing configuration data
Now that we know what can go wrong with the line of code
JsonObject layout = config.getModule(“moduleA”).getElement(“element1”).layout;
we change it to the following:
Sheesh! We went from one line of easy to read code to 23 lines of code that are, to put it charitably, less than elegant.
Admittedly, we added checks for more than just the missing “modules” JSON object, since we rightly surmised that if the “modules” JSON object can get corrupted, the same can happen to “moduleA”, “element1”, or “element1:layout”. But at least we’re learning from our mistakes and we’re proactively adding some much needed error checks along with some useful error messages.
But still, that’s some really ugly code. The whole flow of the code is ruined. Anyone reading this code will be unhappy encountering 23 lines of code, most of which are almost never needed (assuming our configuration file is not usually corrupted).
But wait, there’s more! 
Most likely, we have more than one element in module A whose layout we need to retrieve.
One approach would be to copy the block of 23 lines, paste it in the necessary location(s), change the names to protect the innocent, and get on with our lives. I’ve seen things like that done many times over the years. This practice leads to code bloat, copy/paste errors, and maintenance nightmares.
A much better approach would be to move the 23-line code block to a function that takes parameters for the module name and element name. The 23 lines of code would be even harder to read, since they must be reworked to be generic enough to handle arbitrary module and element names. However, this new function would add some significant code goodness benefits: the error checking logic would be removed from the main flow of the code, any bug fixes or updates to the function would benefit all users of the function, and getting the configuration layout for other elements would be a breeze.
Still, it seems like we can do better. What’s bothersome is the need to do a thorough check on the JSON configuration object, even though the object is usually valid. It would be nice to have the error checking performed on-demand, which could save processing time. After all, nobody likes waiting for an application that’s slow to start. 
We might not mind paying the processing cost when getting a few layout values from a JSON configuration object during application initialization, but suppose we have an application that communicates via internally-generated messages. Most of the time, these messages are valid (since we are generating them ourselves), but what if we update the application’s modules independently of each other and we’re concerned about changes in message formats across versions of modules?
We want to validate a message’s format, but only if the message is invalid, in which case we need to report the issue accurately so that we can fix it quickly. Performing thorough error checking on every message could be a performance killer. Furthermore, it would be nice if when the error checking kicks in, it also generates the corresponding error report automatically. Hey, since we’re wishing, why not wish big?
In the next installment, I will describe an error checking/reporting design pattern that I call the Validator Pattern. The goals of this pattern are:
- Perform thorough error checking and error reporting
- Only perform the error checking when there is an error
- Perform the error checking and reporting automatically
- Remove the code that performs error checking and reporting from the main flow of the code
I make no claim that I am the first person to think of this design pattern, but when I was looking for a solution with the above requirements, I didn’t find anything that was exactly what I wanted. So here we are.
 In case you’re getting irritated about the fact that I’m rolling my own configuration mechanism, yes, I’m aware that there are plenty of libraries that assist in managing an application’s persistence. This is simply an illustration of a common scenario where error checking is necessary and useful error reporting is extremely, well, useful.
 I’m referring to the Jimmy Buffett song “Margaritaville” (https://en.wikipedia.org/wiki/Margaritaville), but I’m also describing a situation known to many software developers where a vacation is interrupted by some urgent customer issue. I have a good friend who has never (to my knowledge) taken a vacation without being required to work on code at some point during the vacation. Talk about vacation interruptus.
 I truly enjoyed playing Sid Meier’s Civilization V, but every time I started it, I had to leave the room and get a cup of coffee. Waiting for it to initialize and be ready for me to play was agonizing.