Because C imposes few restrictions on how you write a program, it is up to the programmer to format and structure code in a style that will be readable by other programmers. Many programming shops have explicit in-house style guides that are enforced on their programmers; but even in the absence of explicit rules, it is worth adopting a clear and consistent style to avoid confusing later programmers (including yourself after you have forgotten why you wrote your code the way you did).
1. Formatting
The goal of good programming style is that anyone reading a fragment of your code should be able to intuit what the code is doing without having to parse detailed bits of it our look up anything else anywhere else. This means that
Variables, functions, and function parameters---with the exception of certain common idiomatic names like i and j for loop variables, c for an input character, or n for the size of an array---should all have meaningful and evocative names. It is much easier to figure out what maximumValue or even maxval means than m. And though FORTRAN programmers have long been cursed by ancient linkers to name functions things like INVTD1, C programmers are free to use much more descriptive names like invertTridiagonalMatrix.
- Anything that is not obvious should be documented in a comment. Nothing that is obvious should be documented in a comment. For example:
- There should be a comment at the top of each file or program describing what the program does.
There should be a comment before each function (except main, which is already covered by the description of the program) that explains what the function does, what each of its arguments means, and what its return value (if any) means. For externally-visible functions in a file, these comments should appear where the function is declared in the header file (the .h file) rather than in with the implementation (the .c file).
Any variable whose meaning is not obvious from context (i.e., any variable except i, j, etc.), should be documented.
- Particularly confusing or abstruse bits of code that cannot be rewritten for greater clarity should be explained in comments.
Conversely, uninformative comments like the notorious i = i+1; /* add one to i */ should be avoided. In general, commenting anything readable violates the no separate but equal rule (see below), and runs the danger that the code will change without the comment being updated.
- Whitespace should be used to make the structure of the code visible.
Use spaces within expressions to make the structure clearer. The expression x = 2*(y+7) + 4*z + sqrt(q); is much easier to parse than x=2*(y+7)+4*z+sqrt(q);.
Use consistent indentation to show structure. The body of a block should be indented relative to the surrounding code. Open an closing braces should be on lines indented to the same level. Labels (e.g. case 2: or escape:) should be outdented relative to the code they apply to.
- Use blank lines to separate coherent pieces of long functions (e.g. the declarations from the rest of the function body, or different parts of the function body that do different things). For example:
2. Structure
By "structure" we mean the choices of how to organize your computation into functions, how to pass information between the functions, how to store it, etc. There are usually many choices of how to do this: at one (almost always bad) extreme, one can put an entire program in a single giant main function and have every variable be a global variable. However, it is better to break up a program into small, manageable chunks, since (a) other programmers will be able to understand what each small piece is doing, (b) you can test the small pieces individually, and (c) you may be able to reuse components in other contexts. Some principles of program structure are:
- Every function should do one clearly-defined task and one task only. This task should abstract away the details of what needs to be done to carry out the task.
- If you have to take more than one simple sentence to explain what a function does, it's too big.
If the function name includes more than one verb (abrogratePeaceTreatyBetrayFormerAlliesAndLaunchMissiles), it's either too big, or being described at the wrong level of abstraction (startWar).
If the function has an argument that radically changes its behavior (e.g., /* argument sortOrClear, if 0, means to sort the array; otherwise write zeroes in all array locations */), it should be split up into separate functions for the separate cases.
A function should communicate with the outside world only through its arguments and return value. If you need extra return values, use pointers to pass in locations to write them. If you have too many arguments (six or more is a traditional rule of thumb), consider grouping them together in structs.
- Just as computation should be organized into coherent functions, data should be organized into coherent structures, and the program as a whole should be organized into coherent modules. The same considerations of having a single clear purpose and limited communcation with the outside world that apply to functions also apply to data structures (or objects) and modules.
No separate but equal. No program should ever include the same thing twice in two different places, because this practice creates a risk that one copy will be changed and the other won't. Examples of this rule in action are:
- If you find yourself writing the same lines of code twice (worse: if you are tempted to copy and paste code from one part of your program to another), you are in a state of sin. Figure out a succinct statement of what the repeated lines of code are doing and wrap them up in a function.
If you find yourself passing around the same bits of data together in different places (e.g. an array that is never used without including its size), use a struct to organize it.
Beware of any program that stores the same information in two different places or two different formats. A sign that this may be happening is that you are writing a lot of "updating" code that detects when one copy changes and fixes the other copy. It is almost always better to pick one copy and use it. A rare exception to this rule is in caching situtaions when the same data needs to be available in two different formats for different purposes (e.g. the source code of a program for displaying error messages and an internal "parse tree" that the compiler uses and modifies to generate code) and it is too expensive to generate one from the other as needed. But in this cases the caching mechanism should be carefully isolated from the rest of the code, with procedures provided to update the data without letting the user modify either copy directly.
- As already mentioned, comments should not duplicate information that is trivially observable in the code. Even comments that include information in the code (like argument names) must be ruthlessly policed to make sure they stay up to date.
For fanatically high-performance code, it may be necessary to violate some of these principles (but maybe not: the cost of calling a simple function can often be optimized away by a smart enough compiler). But this should be done only in cases of dire need, only on the parts of the code that are actually time-critical, and with scrupulous documentation of why you are doing it. Never forget that premature optimization is the root of all evil.
3. Details
Below are some more detailed guidelines that I wrote for the 2002 version of CS223.
This is a partial list of rules of thumb for good programming style in C. These will be used as the basis of style grading for homework assignments, though we reserve the right to penalize egregiously awful code even if it follows all of the principles in this document. More rules may be added as the semester goes on and we notice more things we don't like, so it is worth looking at the on-line version of this document from time to time. A more comprehensive and general overview of the issues of programming style in C can be found on Chapter 1 of Kernighan and Pike.
3.1. Compilation and behavior
These are rules that are enforced explicitly or implicitly by the automated test scripts.
Programs should compile without warnings using gcc -Wall -std=c99 -pedantic.
- Programs should produce only the output demanded by their specification. In particular, programs should not prompt for input, emit extraneous debugging output, or otherwise attempt to chat with the user--- most of the time, the user will be another program.
- Programs should use Unix-style line terminators (newline) on input and output, and not MSDOS-style terminators (carriage return followed by newline).
3.2. Comments
- Every source or header file should begin with a brief comment describing its purpose. Complex modules should get longer comments describing how they work. For modules with a public interface defined in a header file, the header file should contain all the documentation needed by a user of the module; comments within the source file should primarily be used to document details of the implementation that are not intended to be visible to users.
Every public function except main should be documented with a comment sufficiently detailed to allow someone to use the function without reading its code. This comment should explain:
- The purpose and meaning of each argument to the function,
- The meaning of the value returned by the function (if any),
- Whether the function modifies any of its arguments in a way that would be visible to the caller,
- What the function does in case of an error, and
Whether the function examines or modifies any global or static local variables. This comment should appear either in the header file that declares the function (for functions that are intended for public use by other source files), or immediately preceding the definition of the function (for functions that are not declared in header files). The comment should not be duplicated in both of these places, as this violates the no separate but equal rule.
Any source file that provides a main function should include a comment explaining what arguments and inputs the resulting program will expect, and what output it will produce.
Declarations of structs, unions, and enums should include comments describing the purpose and meaning of each component.
The purpose and meaning of global constants defined using #define or other constructions should be documented where these constants are first defined.
- Comments should be used to cite the origin of any significant piece of code or algorithm that is based on the work of others, including material from lectures, the books, or any other source.
- Comments should be used to explain particularly horrible or complicated bits of code that otherwise would not be understandable to a competent but hurried reader, though this is not a substitute for writing readable code.
- Clearly superfluous comments should be avoided. See Kernighan and Pike, page 23, for some examples.
3.3. Naming
Functions should be given names that accurately and evocatively represent their purpose and effects. Function names should be active verbs or verb phrases--- be suspicious of any function that claims to do nothing. The following are good function names: stack_pop, launch_missiles, fill_array, initialize_garbage_collector. The following are not: f, julius_caesar, cleanup_a_bunch_of_stuff, initializer.
Variable names should be descriptive in inverse proportion to their scope; a variable only used in a few lines (like a loop index) can have a short, idiomatic name like i or j, while a variable used throughout the body of a 100-line function should be called something descriptive like average_accumulator or number_of_segments.
All else being equal, long names are better than short names. If you find yourself typing out a lot of long names, you should get in the habit of using your favorite editor's automated expansion commands (e.g. meta-/ in Emacs, control-P and control-N in Vim).
Multiword names should be written to clearly separate the words. Two popular styles are CapWordsStyle and underscores_pretending_to_be_spaces. I personally prefer the underscores style, but whichever style you pick, you should stick with it consistently.
Capitalization should be used consistently to distinguish things like variables (segment_number), #defined constants (SEGMENT_NUMBER), and typedef names (SegmentNumber).
A general naming rule of thumb: if you can look at any five-line section of your code and be able to make a reasonably good guess about what it does without having to look at anything else, your names are good enough.
3.4. Whitespace
Indentation should emphasize the control structure of the program. The bodies of all blocks, including function bodies and struct definitions, should be indented relative to their contexts. Indentation should be consistent both within and between files.
- Spaces should be used within expressions to make them more readable.
- Blank lines should be used within functions to visually separate different parts of the body, such as the variable declarations, initialization code, non-trivial loops, etc.
3.5. Macros
There should be a symbolic constant name defined using #define or some equivalent construct for any nontrivial constant used in your program, especially if that constant is used more than once. Exceptions are most uses of 0, 1, and -1, and string constants (such as format strings or error messages) that only appear once and whose replacement by a macro would only cause confusion.
- Macros that expand to expressions should be parenthesized, e.g.:
Constants whose values depend on other constants should be defined in terms of those constants. For example, if you are writing a program that supports decks of 52 cards consisting of 4 suits of 13 ranks each, a reasonable set of #defines might look like this:
- Parameterized macros should be avoided if possible, but if used, the following rules should be observed:
- Names should be capitalized to distinguish them from functions.
- Parameters should be parenthesized when used, e.g.
- The macro definition should get the same level of documentation as an actual function.
- Parameters that are expanded more than once should be avoided outright, and documented otherwise.
3.6. Global variables
- Global variables should be used only in rare cases of dire necessity. Such cases do not include passing values to or from functions, or maintaining state that should be better represented in an abstract data type allocated at run time.
3.7. Functions
- A function should have a single logical purpose. If you can't explain what a function does in a simple sentence, consider writing more than one function.
Functions with no useful return value should be declared void.
- Any function with more than six arguments or so has too many arguments. This is usually a sign that you need to package related arguments together, or that your function is trying to do too much.
3.8. Code organization
Programs should be written in a modular fashion, grouping similar things together. Even in a program consisting of a single source file, different components (such as #includes, function declarations, struct and typedef declarations should be grouped according to some logical scheme instead of being scattered randomly.
Any function that is used in more than one source file should be declared in a single header file that is #included in any source file that uses or defines the function. Any function except main that is not used in more than one source file should be declared static.
Data structures should be rendered opaque using typedefs when possible. The contents of structs should not be exported in header files unless they are expected to be modified by code outside the module that defines them.
No file should ever #include another source file or any file containing executable code.
- Nontrivial blocks of code that appear more than once should be refactored out as functions.
4. Style checklist
Below is the style checklist used for style grading in CS223. Note that it includes the option to remove points for particularly ugly code not covered by specific guidelines
Style grading checklist Score is 20 points minus 1 for each box checked (but never less than 0) Comments [ ] Undocumented module. [ ] Undocumented function other than main. [ ] Underdocumented function: return value or args not described. [ ] Undocumented program input and output (when main is provided). [ ] Undocumented struct or union components. [ ] Undocumented #define. [ ] Failure to cite code taken from other sources. [ ] Insufficient comments. [ ] Excessive comments. Naming [ ] Meaningless function name. [ ] Confusing variable name. [ ] Inconsistent variable naming style (UgLyName, ugly_name, NAME___UGLY_1). [ ] Inconsistent use of capitalization to distinguish constants. Whitespace [ ] Inconsistent or misleading indentation. [ ] Spaces not used or used misleadingly to break up complicated expressions. [ ] Blank lines not used or used misleadingly to break up long function bodies. Macros [ ] Non-trivial constant with no symbolic name. [ ] Failure to parenthesize expression in macro definition. [ ] Dependent constant not written as expression of earlier constant. [ ] Underdocumented parameterized macro. Global variables [ ] Inappropriate use of a global variable. Functions [ ] Kitchen-sink function that performs multiple unrelated tasks. [ ] Non-void function that returns no useful value. [ ] Function with too many arguments. Code organization [ ] Lack of modularity. [ ] Function used in multiple source files but not declared in header file. [ ] Internal-use-only function not declared static. [ ] Full struct definition in header files when components should be hidden. [ ] #include "file.c" [ ] Substantial repetition of code. Miscellaneous [ ] Other obstacle to readability not mentioned above.