Formatting Your Code: Why Style Matters

Overview

In a previous article I mentioned that consistency is more important than formatting style. But that doesn't mean style is unimportant; quite on the contrary, the style you choose can make a huge difference.

Of course, the pros and cons of different styles have already been discussed in countless books, articles and forums. Rather than jumping into that religious war, I'll simply tell you which style I ascribe to and let you decide for yourself.

Indentation

Spaces suck; tabs rule

Long story short: don't use spaces to indent; instead, use tabs. A tab character represents a virtual indent. Want to indent two levels? Use two tabs. This entirely avoids the headaches of using multiple spaces to indent.

The problem with spaces is that nobody ever uses the same number of spaces for indentation, even within the same file. Some people use eight spaces; others use four; some use two; and still others use some crazy in-between variation. Manually adding or removing indentation is tedious and error-prone because you have to add or delete the proper number of spaces. Sure, you can configure your editor to autoindent but again, it's too easy for some parts of a file to use, say, four spaces for indentation and other parts of the same file to use, say, eight spaces for indentation.

These days you can find decent editors and IDEs that will reformat code to your liking. Problem is, reformatting is an extra step that can easily introduce errors, or even just aesthetically unpleasing source code. Ever had a heredoc statement screwed up by reformatting? Don't you hate it when your nicely-formatted comments, complete with carefully placed ASCII art, custom spacing, indentation and line breaks, are blown to shreds by your "smart" code formatter? Automated or regular reformatting simply isn't worth the effort; it's only worthwhile for one-shot deals to reformat ugly code you've inherited.

It is left as an exercise for the reader to determine how this affects languages in which whitespace (especially leading whitespace) is syntactically significant.

Comments

Where to put them

Since all languages are somewhat different, I'll leave aside the question of whether you should use enclosure-style comments (e.g. /* Е */) or prefix-style comments (e.g. # or //). If your language gives you both options, so much the better; we're going to use them both.

How to write them

Your comments are perhaps the most important part of your code, and yet in practice they are often the most neglected part.

Use proper spelling, grammar and punctuation. In the wild and wooly jungle of source code, purposely misspelled words and abbreviations abound. The most common, obvious example is the ubiquitous variable $i that seems to appear everywhere. In most cases it's far too trivial to describe what $i stands for. But what about that wacky variable called $dx_mnd_frm? Surely it shouldn't be left undescribed.

Write clearly and concisely. I'm talking about all the usual stuff you should have learned in high school and college writing classes. What's that? You picked programming because you don't like to write? In that case you're an idiot who shouldn't be let near a computer. One of my technical writing teachers in college told me a staggering statistic that has stuck with me to this day: engineers write more than authors. Now, this is a vague generalization and I can't find a citation for it but my point here is that what you write about your code Ц in other words, the prose often referred to as "comments" (not to mention documentation, user manuals, and the like) Ц are a massively important part of what you write.

Remember the reader. Sure, you're writing comments to help you remember what your code does, but you should also be writing your comments for the next person to look at your code. They might not be as smart as you. Heck, even if they're smarter than you, they simply don't have the same amount of time invested in developing your code as you do. Your job as a writer is to make their job as reader as easy as possible.

Say it in different ways. A single sentence or phrase can be interpreted many ways. As a programmer, you should be able to guess how your words might be (mis)interpreted and phrase them several ways to eliminate all such misinterpretations. (Examples will come later.)

Examples

I know you're dying for examples of what I'm talking about so here you go:

var $dx_mnd_frm = 0;

This code defines a variable. What language is it written in? That doesn't matter; it's just an example. It is the minimum code required to declare a variable and initialize it to zero. The code itself is not the issue here. This code sample contains no comments. This is a bad thing. Why? Because the programmer now has to hunt down this variable in context and divine what it means and what it does.

So, what type of stuff should the comments say about this variable? Hint: I just gave you the answer above.

What does "dx_mnd_frm" stand for?
What is it used for? (Sometimes, but not always, that will be answered by the previous question.)

Let's update our code sample:

var $dx_mnd_frm = 0; // $dx_mnd_frm stands for "Deluxe Mind Form" and contains the number of questions answered so far in this session.

So, what the heck is a "Deluxe Mind Form?" Again, it doesn't matter; this is just an example of how you should explain what variables stand for and what they are used for. I'm assuming here that the hypothetical reader is familiar with my example "Deluxe Mind Form" industry, or whatever. Note, too, how I used a complete sentence ending with a period. That's critical because it lets the reader know that there is nothing further to read (no need to keep scrolling), and that the comment is indeed complete and hasn't accidentally been truncated by a fat finger or automated process. If they were to read a comment that didn't end with a period they would know immediately something was amiss. (There are exceptions to the "always end your sentences with period (or other punctuation)" rule but I'll get to them later.)

This is a great time to bring up the question of where you should put your comments. Should they go at the end of the line of code they are describing? Above it? Below it? Elsewhere? For example, here is another way of writing this comment:

// $dx_mnd_frm stands for "Deluxe Mind Form" and contains the number of questions answered so far in this session.

var $dx_mnd_frm = 0;

And the reverse example (comments come after the code):

var $dx_mnd_frm = 0;

// $dx_mnd_frm stands for "Deluxe Mind Form" and contains the number of questions answered so far in this session.

In the case where comments are relatively short, but more important they describe a single line of code, they should appear on that same line of code. That way they won't be accidentally separated or moved from the associated code if that code is ever moved.

Lest you think this is a trivial example and not worth your time, let me once again rail against programmers who think certain things are too trivial or obvious to comment on. The question isn't "how trivial or obvious is it?" but rather "in how many ways could it be possibly misinterpreted?"

So, how about an example where comments appear above the code? I'm glad you asked:

var j = 0;

var $a = get_some_stuff();

for ($i=0; $i < $a.length(); $i++) {

ааааа var $x = $a[$i].headline();

ааааа if ($x == Сtest') {

ааааааааааа $j++;
ааааа }

}

print "There are $j headlines that contain Сtest'\n";

This is a pretty trivial example. Even a newbie should take less than a minute to determine that this code counts the number of elements in $a whose headline is Сtest' and prints out a sentence to that effect.

Now, that's wonderful and great, but I have a few bones to pick:

What does "$j" stand for?
Likewise, what is "$a?"
What is "$x?"
"$i" is fine because it's obviously a generic loop control variable, but nothing describes the purpose of the loop.

Let's update our code:

// Count number of articles that contain a headline of "test":

var $headline_count = 0; // Contains number of headlines.

var $articles = get_some_stuff();

for ($i=0; $i < $articles.length(); $i++) {

ааааа var $headline = $articles[$i].headline();

ааааа if ($headline == Сtest') {

ааааааааааа $headline_count++;
ааааа }

}

// Print number of headlines:

print "There are $headline_count headlines that contain Сtest'\n";

Note the following changes:

I've added a comment ("Count number of headlinesЕ") that describes what the block of code does that follows. "Block" in this case refers not just to the scope delimited by the braces, but rather to the physical, contiguous lines of code up to the next blank line. This is an important distinction, because the code that does the counting also includes two lines of code before the loop.
$j has been renamed to $headline_count
$a has been renamed to $articles
$x has been renamed to $headline
A comment has been added before the next physical block of code that prints the results. This is a (somewhat crude) way of signifying that the block of code described by the previous comment has ended. (Sometimes comments will have to describe blocks of code that contain multiple blank lines, for better legibility; this is inevitable.)

Compound statements

Why do programmers have an aversion to explaining their compound statements? It's so much easier to read an English description than to puzzle through nested parens sprinkled with Boolean conditions. For example:

if ((($action == $X_FILE_SAVE) && ('' == $FORM_CANVAS_DATA.$PARAM_FILENAME.value)) || ($action == $X_FILE_SAVE_AS)) {

ааааа // do something

} else {

// do something else
}

Admittedly it's not that difficult to figure out what conditions are necessary for the first or second part of this block of code to be executed. But remember, this is just a simple example. In any case, why make it harder than necessary for your readers? Use whitespace (and by that I mean tabs!) to clear the air:

// If first time saving the file, or if explicitly asked for the Save As window:

if (

ааааа (

ааааа ($action == $X_FILE_SAVE)

ааааааааааа &&

ааааа а('' == $FORM_CANVAS_DATA.$PARAM_FILENAME.value)

)

ааааа ($action == $X_FILE_SAVE_AS)

) {

ааааа // do something

} else {

// do something else
}

Exceptions

Earlier I mentioned that there are exceptions to the "always end your sentences with period (or other punctuation)" rule. Here they are:

When initializing a variable, it's smart to describe why you selected that particular value. For example:

var $got_it = false;

By now you should be able to tell me how to write meaningful comments for this code. It should look something like this:

var $got_test = false; // Whether we got the test article

Since this is just a made-up example that is lacking a larger context, the exact meaning of the comment is irrelevant for the moment. My point is that the comment accurately describes what the variable is storing and, because it's a Boolean, we can assume the only other valid value would be "true." Thus, at that point in the code we are assuming that we have not gotten the test article.

Every initialization of a value is an assumption that the given value is the default. If the subsequent code doesn't change the value, it will remain as assigned initially. For example:

// Determine whether we got the test article:

var $got_test = false; // assume

var $articles = get_some_stuff();

for ($i=0; $i < $articles.length(); $i++) {

ааааа var $headline = $articles[$i].headline();

ааааа if ($headline == Сtest') {

ааааааааааа // We got the test article:

ааааааааааа $got_test = true;

ааааа break;

ааааа }

}

The first comment ("Determine whether we got the test article") describes the subsequent block.

The second comment ("assume") describes why we are assigning the given value. In this case we are assuming failure (false), then testing for success (true). If we never find success, the default value (false) is used.

The third comment ("We got the test article") describes the subsequent block of code. Note that this comment is on a line by itself because the subsequent block of code contains two statements:

assign a new value to $got_test
exit from the loop

It would be misleading to put the comment at the end of the assignment statement because the comment applies to the entire block, not just that line.

Brackets/Braces/Parentheses

Another classic religious war is how to style your braces. The only two acceptable variations are:

loop(...) {

ааааа do_something();

}

or:

loop(...)

{

ааааа do_something();

}

There are compelling advantages and disadvantages to each so I can't say I have a strong opinion. However, here are some things you definitely should NOT do:

loop(...)

ааааа do_something();

While this may be syntactically and logically correct, it completely screws you when you attempt to add another statement or block of code within the loop. For example, if you simply do this:

loop(...)

ааааа do_something();

ааааа do_something_else();

you will be screwing yourself because this is equivalent to:

loop(...) {

ааааа do_something();

}

do_something_else();

Why? Because you were stupid enough to use indentation as your sole means of determining the scope in which a statement falls. In this case the compiler can't help you; you've effectively shot yourself in the foot.

So what's the right solution? Stick with the first or second version above.

Empty blocks

What's an empty block and when would you use it? First, let's step back and think about conditional statements. Most compound conditional statements look something like this:

if (condition1) {

ааааа // We found several foobars in the database

ааааа // do something...

} else if (condition2) {
ааааа // The foobar was specified by the user

ааааа // do something...

} else if (condition3) {
ааааа // No foobars exist

ааааа // do something...

}

This is great, but it leaves the reader hanging. What happens if none of those conditions are met? Shouldn't there be a final catch-all "else" statement? Obviously the code falls through, but the reader is still clueless when it comes to understanding under what conditions that would happen and, more important, what it means in a larger context, and even whether the programmer writing the code even considered the catch-all case. Here is my solution:

if (condition1) {

ааааа // We found several foobars in the database

ааааа // do something...

} else if (condition2) {
ааааа // The foobar was specified by the user

ааааа // do something...

} else if (condition3) {
ааааа // No foobars exist

ааааа // do something...

} else {

// Do nothing; we will email the user later.
}

Here it's perfectly clear that the programmer didn't forget the catch-all "else" statement at the end of the compound conditional statement; in fact, the programmer has made it clear that nothing should happen at this point, but it will happen later.

As a bonus, if additional code ever does have to be inserted into that final else block, it can be dropped in without changing the surrounding code. If you had to add the else block it would introduce another opportunity for a typo.

Implied Precedence

Hot-shot programmers often write things like this:

$x = 5 * $j + $y / 6;

Only the most novice programmer would get tripped up by rules of operator precedence, so on the surface it seems largely unnecessary to add parentheses. But this code leaves open a gaping question: did the programmer intend to let this statement be evaluated by natural operator precedence rules? The answer is important, because if the "+" changes to "*" (e.g. because the formula needs to be corrected) then the precedence of the entire formula changes.

I would write this code as follows:

$x = (5 * $j) + ($y / 6);

This makes it clear the order in which the programmer intended this statement to be evaluated. There's still one thing missing from this code fragment. That's right: comments! Again, this is left as an exercise for the reader.

Summary

Let's recap the general rules:

Always use braces to delimit blocks, evenЧnay, especiallyЧone-line blocks
Use tabs (not spaces) for indenting
Comment every block
Comments that apply to a single line should go at the end of that line
Try to end comments with periods (or other appropriate punctuation)
Use braces around every block, even one-line blocks
Use parentheses to both enforce and express precedence

Conclusion

So what's the upshot? Write your code so it can be easily read and easily understood, not just by you but also by the poor schmuck who has to figure out your formatting style. Esoteric eccentricities are usually bad; conventional practices are good.

Much of your comments and code style should be geared towards hand-holding the next person who will be reading your code. No, you don't have to explain every last detail to them; that would be ridiculous. Rather, you want to insert the equivalent of post-it notes around your apartment for when your friend comes to visit for a few days and you're not there to show them what drawer the silverware is in, what pantry shelf the mayo is on, and where to put away the clean dishes. Rather than force them to figure it out, why not help them? After all, it's the hospitable thing to do.

Coding

Management

Photography

Formatting Your Code: Why Style Matters

Overview

Indentation

Spaces suck; tabs rule

Comments

Where to put them

How to write them

Examples

Compound statements

Exceptions

Brackets/Braces/Parentheses

Empty blocks

Implied Precedence

Summary

Conclusion