Coding Guidelines for CS-701

Spring 2003

Introduction

This document gives the rules you are to follow for all the programs you write for CS-701. The goal is for you to write code that is correct, robust, efficient, and easy to understand. Most of the detailed specifications given given in this document apply to the "easy to understand" criterion because code that is easy to understand is also most likely to be correct, robust, and efficient too!

Programming Language

All code for this course is to be written in C++ and compiled using the GNU g++ compiler.

All code must compile and link with no warnings and no errors when compiled with g++ using the `-g -Wwrite-strings -Wall` command line options.

There is a [ separate web page on writeable strings ] that explains the significance of the -Wwrite-stings option.

Because you are programming in C++ you must supply a function prototype for every function you define except main(). (The C language does not require prototypes, but C++ does.)

Your function prototype must be in a header file if and only if the function is defined in one source module and referenced from another module. If the function is referenced only from within the same module in which it is defined and if the definition precedes all references to the function, the definition itself may be used as the prototype.

You may use most, but not all, of the features of the C++ lanugage in your code. Parts of the language not to use include:

Do not use I/O Streams, including cin and cout. Use the standard C I/O library instead (printf(), fgets(), etc.) or, when an assignment makes it explicit that you should do so, use Unix system calls (write(), read(), etc.). This means you will use the header files stdio.h, stdlib.h, and unistd.h rather than iostream.
Do not use new and delete for dynamic memory management. Use malloc(), realloc(), and free() instead.
In general, do not define classes and do not use the C++ standard template library. The vast majority of the code written for this course is not object-oriented by nature. However, there will be some simple data structures used in the assignments (lists and tables) where it would be entirely appropriate to encapsulate the structure and its programming interface into a class, and you are encouraged to do so. If you do define any classes, of course, you will have to use new and delete to manage the memory for the objects, despite the previous item in this list. Just be careful not to try to free() memory that was allocated with new nor to delete memory that was allocated with malloc().

Project Management

An important goal of this course is for you to learn how to use the Revision Control System (RCS) and the make utility to manage programming projects. There are separate web pages on Using RCS and Using make that will be assigned at the appropriate time in the course. Meantime, the follow material is to help you get your projects set up correctly in general.

Set up a separate project directory for each assignment, and put only the files that actually are part of that assignment in the project directory. For assignments that are continuations of previous assignments, reuse the same project directory.

Create a subdirectory named RCS (must be all capital letters) in the project directory. You will practice using rcs to keep track of program versions in this course.

Once you have learned how to use rcs and make (or gmake as the case may be), you will be required to make the project directory "clean" so it contains nothing but the RCS subdirectory and possibly a text file named README (spelled and capitalized just like that) before submitting an assignment.

To submit an assignment, make the project directory "clean," change to the directory above the project directory, and create a tar file of the project directory using a command like the following:

    tar cvf Assignment_1.tar Assignment_1

This example assumes the project directory name is Assignment_1. After you create the tar file, submit it to me as an attachment to an email message. Warning! Tar files are "binary" (not text) files, and will be corrupted if you are not careful when you copy them from one computer to another. If you use ftp to transfer a tar file, for example, you have to be sure to use "binary mode" for the file transfer. To be sure your tar file is in good shape, you should email a copy to yourself using exactly the same technique you will use to send it to me. Then create a temporary directory separate from your project directory, save the tar file there, and be sure you can extract the contents of the received copy successfully using a command like the following:

    tar xvf Assignment_1.tar

Text Editing

You must use a Unix programmer's editor to prepare all source files for this course. Two free editors that will satisfy this requirement are vim ("Vi Improved") and emacs. Both of these editors normally come preinstalled on all Unix systems. There are other editors available for various Unix systems, but these two are industry standards, and you should pick one of them unless you are already very familiar with a different one. Of the two, vim is easier to use and perhaps more universally available, so that is the assumed editor of choice for the course. Examples of editors that are not programmer's editors are pico and various editors packagedi with various Linux desktop environments, such as KDE or Gnome. And, of course, Notepad, which is a Windows program anyway.

What makes an editor a "programmer's editor?" Any programmer's editor will include at least the following features to help you produce easy to read and well-formatted source code:

Syntax Highlighting. A programmer's editor recognizes the syntax rules for the programming language you are working with, and colors various elements of the code, such as comments, keywords, variable names, etc. to indicate what is what. Editors like vim come with built-in support for dozens of programming languages, and can be customized to support others as well.
Automatic Indentation. A programmer's editor recognizes language elements like curly braces and automatically indents your code to reflect the program's syntactic structure as you type it. When you change the structure of the code, a programmer's editor gives you commands to shift blocks of code left or right to fix up indentation changes.
Pair Matching. If you put the cursor on any sort of "pair character" such as curly, round, or square braces ( {}, (), or [] ), the cursor should automatically jump to the matching brace or bracket to help you see that you have your code matched up properly. With vim, this cursor jumping can happen eiter automatically as you type and/or when you type a % character when the cursor is positioned on a brace.
Note that pair matching eliminates the need for comments that indicate the structure of the code, like "} // end while". These comments are useful when you use a plain text editor, but are poor programming style in general because it is too easy for the comments and the braces to get out of synch with each other over the lifetime of a large project.
Tab Expansion. A sad fact of life is that all tabs are not created equal. You might set tab stops every 4 columns in your code, but the next programmer might prefer tab stops every 2 columns. Or a file may get printed, where tab stops are set for every 8 columns, or perhaps totally ignored. The bottom line is that you cannot control the format of code that has tab characters in it, and the only solution is to eliminate them completely! (I've seen books that claim you should always used tabs and to set stops every 8 characters. You can do whatever you like after this course is over, but for now the rule is NO TABS!)
Tab stops should be set for either every two or four characters, neither more or less. Less makes identation too difficult to see, and more leads to code that gets so deeply indented that it disappears off the right end of the screen.
Set your programmer's editor so that it uses two or four character tab stops for its automatic indenting feature, and set it up so that it always expands tab characters to the correct number of spaces in your code to simulate the behavior of tab characters. Note that the editors that come with Integrated Development Environments (IDEs) often provide all the programmer's editor features listed here except for control over tab stops.

No text file you submit in this course is to contain any tab characters.
Recognize Different File Formats. Unix, Windows, and Macintosh all follow different rules for how lines are terminated in text files. Unix files use the ASCII linefeed character (0x0A) at the end of each line, Machintosh uses ASCII character returns (0x0D) at the end of each line, and Windows uses both carriage return and linefeed at the end of each line. Generally, the difference isn't significant to compilers, and good programmer's editors recognize what kind of file they are editing and adjust their behavior accordingly. But you can get strange behavior if you edit the same file on two different platforms with different editors. If you do want to do editing on different platforms, the best route is to use vim, which is available for both Unix and Windows, and does a good job of keeping files compatibile with each other if you move them across platforms.

Setting Up Your Editor

There is a web page here to help you install and set up the current version of vim on your computer. The current version of vim should work well as soon as you install it. But that web page includes a sample iniitialzation file (.vimrc) that you should copy to your home directory to be sure all features, such as tab expansion, are set up properly.

Coding Style

The remainder of this document tells you how to structure your C++ code so that it meets the course requirements for Correctness, Robustness, and Efficiency.

A correct program is one that does what it is supposed to do when all inputs take on values anywhere within their expected range. Be sure to test your programs for correctness before submitting them. Pay special attention to situations on the edge of the expected range of values. For example, if your program has to read lines of text from a file, it should not have a problem dealing with empty lines.

A robust program is one that behaves in a "reasonable manner" when it encounters input values that are not within their expected range or if expected parameters are missing. The reasonable thing to do, depending on the severity of the error and the nature of the program, is to issue an error message and continue processing (recovery) or to issue an error message and terminate (abort). Be sure to test your program's behavior in response to "bad input" before submitting it too. Instead of "garbage in, garbage out" your code should operate on the "garbage in, explanation out" principle. Explanations should be meaningful, but terse.

An efficient program is one that performs only those computations necessary to accomplish the work at hand. Aside from the obvious advantage of executing quickly, efficient code is typically much easier for someone else to read and understand than code which performs extraneous operations, which the reader has to understand in order to know that they can safely be ignored!

Source File Structure

As mentioned in the Introduction, making code easy to read is one important way to achieve the goals of robustness, clarity, and efficiency. To make your code easy to read, all source modules (.cc files and .h files) must contain the following sections in the order listed here:

File Introduction

The file must begin with a block of comments that introduce the file, called the File Introduction. The first line of the file introduction must contain the RCS keyword, $Id$ , which will be expanded by RCS to give the file's name, date of modification, and some other information. Be sure to punctuate and capitalize the keyword exactly as shown, or RCS won't recognize it. Using the RCS utility for project management will be covered in class.

The File Introduction then continues with comments that give a Summary of the file's contents.

If there is more than one function definition in the file, follow the file summary with a Functions section, which is a list of the names of all the functions defined in the file, along with a brief phrase identifying each. Each item in this list should normally fit on a single line. The list must be in the same order as the sequence of function definitions in the file. Do not list functions referenced from within the file, only functions defined in the file.

The Revision History for the file is the section of the file introduction in which you list the changes made to the file and the dates the changes were made. The good news is that you can generate the contents of this section completely automatically by putting the RCS keyword $Log$ inside your comments.

Include the Author's Name in the Author section of the File Introduction. When more than one programmer works on a file, the authors' names go in the revision history. For this course, only one person works on a file, so the author's name goes in its own section. If I give you some code to use as basis for part of a project, put your name underneath mine with a comment to the effect that you modified the code. (Don't put in my name unless I provided a significant part of the code. Sample code given in class, for example, is not "significant.")

Each of the File Introduction sections ("Summary," "Function Names," "Revision History," and "Author") should be preceded its own sub-heading name, but you may omit these names if you think it makes your code easier to read.

Here is a template you can use for your File Introductions:

     //  $Id$
     /*
      *  Summary
      *
      *    [A sentence or two giving the role of this module in the
      *    overall structure of the project.]
      *
      *  Function Names
      *
      *    [A list of the names of all functions defined in this
      *    module in the order in which they are defined.  Follow each
      *    name with a phrase summarizing what the function does.]
      *
      *  Revision History
      *
      *    $Log$
      *
      *  Author:  [Your Name]
      *
      */

Notice that the first line uses the // type of comment, but that the other lines use the /* ... */ style. It's important to use the /* ... */ for the part of the comment block that includes the RCS $Log$ keyword because RCS will expand $Log$ into a list of comments designed to fit inside /* ... */ comment blocks, not in // comment lines.

All Makefiles and man page files must also have a file header with comments giving the $Id$, Summary, Revision History, and Author sections.

Include Files, Manifest Constants, and Macro Definitions

All #include and #define statements follow the File Introduction section. Put #define statements that are used in multiple source modules into a header file. Although ANSI C does not require it, putting a header file name in angle brackets (< and >) conventionally means the header file is one that is supplied with the compiler or part of a standard programming package that is installed on the development system. Putting the name of the header file in quotes (") conventionally means that the header file is specific to the current project and is located in the project directory.

Function Definitions

Every function definition (including main() !!!) begins with a Function Introduction, followed by the function definition itself.

Function Introduction

A function introduction is a block of comments that contains the following information in the order listed here:

Name
The name of the function.
Summary
A statement of the purpose of the function. Use one or more complete sentences. Write in the present tense.
Arguments
List the arguments to be passed to the function using the names used in the function definition. Give a phrase telling what each is used for and any assumptions made about the valid range of values the argument may meaningfully take. If any arguments are pointers to values that are modified by the function, say so.
Return Value
Tell what values are returned by the function. If no value is returned, say so.
Global Variables
List any global variables that are referenced or modified by the function.
Algorithm
List the steps the function executes. Use imperative sentences.

Here is a template you could use for Function Introductions. As with the File Introduction, the heading names are recommended, but not required.

  /*  functionName()
   *  ------------------------------------------------------------
   *
   *  Summary
   *
   *  Arguments
   *
   *  Return Value
   *
   *  Global Variables Referenced
   *
   *  Global Variables Modified
   *
   *  Algorithm
   *
   *    1.
   *
   */

Note the comment line with dashes under the function name. The idea is to make it clear where each function definition begins. The next programmer to read your code (the one who has to add a new feature or fix a bug) will be eternally grateful for this textual landmark. Likewise, notice that all comments need to have two spaces between the comment leader (the //, /*, or the * at the beginning of the line) and the textual part of the comment. This whitespace is an important part of making your code easy to read.

Omit the Global Variables sections if your program doesn't use global variables.

The Algorithm section does not need to be at all elaborate. The idea here is to provide the reader with a guide to the code that makes up the function definition. The code itself will give the details of the algorithm; the comments here, which should parallel the lines of comments in the function body, should give the reader an overview of the algorithm being implemented.

Writing Function Definitions

Use meaningful variable names. (However, "anonymous" variable names like i, j, and k are OK for integers used to index arrays.) In general, you do not need to comment your variable declarations. Use a consistent style of capitalization and underscores for variable names. Manifest constants (set up using #define) are normally all capitals. Variable and function names should be mixed upper/lower case, with underscores or internal capitals to separate words inside a name. (num_commands or numCommands, for example.) Data types that you define using struct, enum, or typedef conventionally have names that end in "_t." (

struct node_t { ...
};

for example)

Use a consistent indentation style that shows the lexical structure of your code. Blank lines and other whitespace generally improve your code's legibility. Choose an indent increment of either 2, or 4 spaces. Anything larger will result in lines of code that get pushed over to the right margin and are hard to read.

No code or comment line may be more than 72 characters long.

Remember the following feature of the language when breaking up long pieces of code so they will meet this requirement:

    printf( "This is a very long line of text that I want to print\n" );

can be rewritten as:

    printf( "This is a very long "
            "line of text that I "
            "want to print\n" );

The only comments you have to write in your function definitions are ones that correspond to the steps you listed in the Algorithm section of the Function Introduction. These comments go on lines by themselves just before the code that implements each step of the algorithm. Of course, you should add other comments if a piece of code is difficult to understand, but try to write clear code so the need for these extra comments is minimized.

There are only two acceptable arguments to the exit() function: EXIT_SUCCESS and EXIT_FAILURE. These two constants are defined in stdlib.h, along with the function prototype for exit(). These two constants normally have the values of 0 and 1 respectively, and it is all right to use these values explicitly rather than the constant names. But using any other values makes your code "unconventional" and thus more difficult for other programmers to understand easily.

Avoid flag variables wherever possible. If you must have one, give it a meaningful name, and use type bool rather than int for it. Use true and false as boolean literals; they are part of the language, but other forms, such as TRUE and FALSE are not defined on all systems.

Notes About This Style

This coding style discourages the use of many small functions to implement a program, which is intentional. First of all, small functions generally are utilities. In general, the standard C library provides all the utility functions you need. Be sure you are familiar with all the utilities available to you, especially the string manipulation functions, before you decide you need your own new one. Writing code that re-implements a standard function is very distracting to the people who look at your code after you are finished with it, and should be avoided. If you do need to write a utility function, this coding style encourages you to think about it carefully. It is often the case that the same amount of code can provide general utility instead of satisfying just a single need, which increases the probability that the function can be used in other applications. For this reason, utility functions should be in separate source modules so they can be incorporated into other applications easily.
Believe it or not (and I know you don't), it really makes more sense to write the comments before you write the code than the other way around. Listing the steps of the algorithm performed by a function and then creating the sectional comments for each step means that writing the code is just a matter of filling in the blanks. It is much more effective use of your time to think and then write code than its is to "debug a program into existence." On top of that, the world is full of programmers who lament, "the program stopped working when I added the comments!" Editing a file opens up the possibility of inadvertent changes, changes that always seem particularly difficult to debug.
It could be argued that this style discourages program modularity by increasing the overhead on the programmer for each new function defined. Actually, good modularity demands that the interfaces to each module should be carefully designed and that each module should be a self-contained entity as much as possible. The style presented here encourages this type of modularity.

Christopher Vickery
Queens College of CUNY