Effective Proactive Debugging Techniques: It's All About the Tools

Introduction


Coding

Formatting Your Code
Why style matters

Universal Programmers Toolkit
Care and feeding of your code collection

Effective Proactive Debugging Techniques
It's all about the tools

Good Programming Practices
What to do (or not)

Banning Bad Bots
A short but effective script


Management

The Joy of Specs
How to (almost) guarantee a successful project

Habits of Successful Freelancers
Advice for success

How to Become a Great Programmer
One easy lesson!

Bidding on a Stranger's Project
The basics

Freelancing 101 - Don't Send That Email!
Pick up the phone instead

Ensuring Your Web Site Project Succeeds
Advice for clients


Photography

How to Take Great Photos (And Fix Lousy Ones), Part 1
Composing and shooting your photos

How to Take Great Photos (And Fix Lousy Ones), Part 2
Editing and postproduction

Effective Proactive Debugging Techniques: It's All About the Tools

Overview

There are two basic ways to debug your program: retroactively (by stepping through it or analyzing core dumps) or proactively (by including code that analyzes and logs the state of your application as it's running). This article will focus on proactive debugging tools and techniques that you can implement on any platform.

Just Say "No" To Debuggers

Don't get me wrong, IDEs and debuggers are extremely helpful in stepping through and analyzing your code. But sometimes you don't always have the luxury of an IDE or debugger, such as on a live system that you can't tweak or take down. You may also be working with a networked application that depends on asynchronous systems outside of your control.

 

My point here is that by the time you break out the debugger, the error has already occurred, the program has terminated, and the client is wondering if you really know what you're doing.

 

Wouldn't it be great if your program could:

 

  1. Catch potential errors proactively, i.e. before they happen
  2. Report errors and warnings behind the scenes so the user either isn't bothered with technical jargon or doesn't even see a message if they don't need to
  3. Log info that you want logged, not just a massive core dump that may not include the state of external systems
  4. Recover gracefully whenever possible

 

Let's cover each of these in more detail.

Catching Errors Proactively

In a nutshell, this involves adding code that obsessively checks for error conditions and acts accordingly. First, you'll want to make sure you're testing the return value of every function. This is Programming 101, and should go without saying. For example, here is bad code that doesn't check the return value of a function:

 

int *a = malloc(1024);

 

a[0] = ‘x';

 

If you don't know what's wrong with this code then you probably shouldn't be reading the rest of this article.

 

Here's an improved version of the above code:

 

int *a = malloc(1024);

 

if (NULL == a) {

      fprintf("Unable to allocate 1024 bytes\n");

      exit(-1);

}

 

a[0] = ‘x';

 

This is better, but all it does is print an error message and exit. This is good enough for a school project, but won't (or shouldn't) fly in the real world. Let's fix it up a bit:

 

int *a = malloc(1024);

 

if (NULL == a) {

      // Log the error

      report_fatal_error("Unable to allocate 1024 bytes");

} else {

a[0] = ‘x';

}

 

What's this report_fatal_error() function and why is it any better than printing an error message and exiting, as in the previous example?

 

In this example, report_fatal_error() is a function written by you. It can be implemented any way you want; in a development environment it may print a message and exit, while in a production environment it may log the error to a log file and continue running (if possible).

 

Sure, not every error condition is recoverable, but we'll get to that later.

 

Would a "try…catch" block would work just as well as "if…then"? Certainly! The point here is that at every possible point of failure you should be able to identify the module that failed, the precise line of code where the error occurred, and any local variables or other relevant data.

Reporting Errors

Now that you've caught an error, what do you do with it? Report it immediately! Depending on the severity of the error, this could involve something as simple as ignoring it completely (not likely), writing a diagnostic message to a log file (very likely), or abending immediately (to be avoided if at all possible).

 

Deciding what events could happen and how severe they are is left as an exercise for the reader since it will depend largely on the nature of your application. At minimum you'll want two general categories:

 

 

More likely you'll want to break the latter into two separate categories, resulting in:

 

 

As for where to report errors, generally two types of people will want to know about them:

 

 

For user errors you'll obviously want to display a message that the user can understand.

 

For recoverable system errors you may or may not want to alert the user but you'll definitely want to alert the programmer (more on this later).

 

For unrecoverable system errors you will want to alert both the user and the programmer.

Logging

What to log

The more information you have about each error, the easier it will be to diagnose. So, what will you want to log?

 

Where to log it

Two places you might want to log errors would be:

 

 

Which of these options you choose will depend on which ones are available to you.

 

The advantage of logging to a database is that the log is easily searchable and sortable; the advantage of logging to a flat file is that it's usually simpler and the data is available immediately (no DB connection required).

 

Another issue to consider is how to log an error to the database if the error involves not being able to connect to the database to begin with. I prefer a layered approach: first attempt to log to the database; if that doesn't work then attempt to log to a text file; if that doesn't work then attempt to write to the console.

Messages

Writing a helpful error message

Error messages can be incredibly useful, horribly useless, or anywhere in between depending on how they're worded. Useful error messages give as many details as possible about the nature of the error. Here is an example of a useless error message:

 

Error!

 

Some of you reading this will laugh because that's a silly example; others will laugh harder because they've actually seen error messages like this.

 

Let's improve it a bit:

 

Error! $a is 0

 

Okay, this tells us more about the nature of the error, but it doesn't tell us what $a represents or what it should be. Let's improve it some more:

 

Error! $a (number of articles) is ‘0' but expected an integer > 0

 

The obvious changes here are that I've indicated what $a represents (number of articles) and what the expected value should be. Can you spot the other, less obvious change? I've added single quotes around the value of $a (in this case 0). Why? So that in case $a contained extraneous characters (such as spaces) it would be more obvious. Also, I want to know whether $a is 0, null, false, blank, or something else. Logically there may be no difference between 0, null, false, blank and whitespace, but to the programmer the difference between one and the other can provide valuable clues as to where the error was introduced.

 

All too often the extra garbage that causes errors is not always visible in every context. This includes high ASCII characters and control characters (carriage returns, linefeeds, tabs, nulls), and HTML tags and entities. Programmers who try to print out these values verbatim will often never see the problem because it's being masked by the filtering that happens, such as when your text editor strips blanks or your Web browser interprets HTML tags. If you want to display the real underlying data you often have to resort to displaying it in non-standard ways.

 

Anyway, let's continue improving our error message. We're still missing crucial information about the error: when and where.

 

2006-10-22 14:26:45: ERROR: $a (number of articles) is ‘0' but expected an integer > 0 on line 5 of /foo/bar.baz

 

In this case, "when" is the date and time, and "where" is the file name and line number.

 

Of course, your language and application may provide additional data or meta-data that would be useful for debugging that you might want to display; those details are left as an exercise for you, since I obviously don't know what platform you're running.

Attributes of a useful error message

Your error message should contain, at minimum, the following:

 

Recovering Gracefully

In case you haven't guessed, this is one of my pet peeves. Too many programmers simply cause their programs to exit when they're unable to perform a required function such as connecting to a database. At best it's ugly; at worst it's rude, inconvenient and unprofessional.

 

For example:

 

$db = mysql_connect() or die(mysql_error());

$result = mysql_query("SELECT SESSION_USER(), CURRENT_USER();");

 

I would fix this code as follows:

 

  1. Wrap the call to mysql_connect() with my own wrapper function that tries several times to connect before failing
  2. Log the error so the programmer can see it and take action

 

Here is my updated code:

 

$db = connect_to_db();

if ($db) {

$result = mysql_query("SELECT SESSION_USER(), CURRENT_USER();");

// Do more stuff here

} else {

report_fatal_error(‘Unable to connect to DB.');
}

Summary

Instead of waiting for errors to happen and then debugging core dumps, write code that actively looks for problematic situations at runtime, reports them in a way that will help you fix them, and continues executing to the best of its abilities.

 

===END===



Return to Kim Moser's Generic Home Page.
Copyright © 2024 by Kim Moser (email)
Last modified: Wed 09 January 2008 17:29:54