[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Bureaucratic part

This part you will not be graded on, but you should do it anyway.

Send me email. My address is <aspnes@cs.yale.edu>. In your message, include:

  1. Your name.
  2. Your status: whether you are an undergraduate, grad student, auditor, etc.
  3. Whether you have ever taken a class that used Grade-o-Matic before.1

  4. Anything else you'd like to say.

2. Technical part

This part you will be graded on.

Run-length encoding is simple compression technique designed for texts with many repeated characters. For this assignment, you are to implement a run-length encoder in a file rle.c. Your encoder should follow these rules:

1. Any repeating sequence of 2 to 9 characters is replaced by a count (one digit) followed by the repeated character. So, for example, ooooo becomes 5o and xx becomes 2x.

2. Repeating sequences of more than 9 characters are split into blocks of 9 characters starting with the first character, possibly leaving some characters left over. For example, eeeeeeeeeeeeee becomes 9e5e.

3. Single characters are passed through intact if they are not digits. A digit x is encoded as 1x. So for example, the encoding of abc123444 would be abc11121334.

You may find it useful to use the standard library function isdigit to test if a character is a digit. You will need to put #include <ctype.h> at the top of your program to use this.

Your program should read characters from standard input (e.g., using getchar) and write them to standard output (e.g. using putchar). It should continue processing all characters until it encounters EOF. You do not need to write a decoder, but your encoder must encode all inputs correctly according to the above rules.

Here is a longer example of an input:

{{{"Wheeeeee!" said bumblebee number THX-1138.

"Wheeeeeeeeeeeeeeeee!"}}}

and the corresponding output, generated by running the command ./rle < input > output using the sample solution:

{{{"Wh6e!" said bumbleb2e number THX-211318.3 "Wh9e8e!"}}}

Note that the example shows how the encoding rules apply to all characters, including newlines: the three newlines in the input become a 3 followed by a single newline in the output. Your program should not treat any characters except digits specially.

3. Submitting your assignment

To submit your assignment, use the program /c/cs223/bin/submit on any machine in the Zoo, like this:

{{{$ /c/cs223/bin/submit 1 rle.c Copying rle.c $}}}

Here the 1 is the assignment number and rle.c is the name of your source file (which must be exactly rle.c). There is no limit to how many times you can submit your assignment, but only the last version submitted will be graded.

You can check that the file was copied correctly by running /c/cs223/bin/check. Additional tools for dealing with submissions are described in HowToUseTheComputingFacilities.

4. Testing your assignment

A public test script is available on the Zoo in /c/cs223/Hwk1/test.rle. You can run this with no arguments in a directory containing a file rle.c, and it will compile your program and run various cryptic tests on it. The tests used for grading may be different, so the score you get from test.rle may not accurately predict your final grade.

Note that test.rle first tries compiling your program with the line gcc -Wall -ansi -pedantic -o rle rle.c, which turns on all of gcc's warnings about errors or use of non-standard C extensions (e.g., features from C++, which gcc will often permit by default). You should make sure that your program compiles without warnings with these flags for full credit.

After submitting your assignment, you can run the test.rle script on the submitted files with the command

/c/cs223/bin/testit 1 rle

This can be useful for detecting if there is something wrong with your submission before it's too late.

5. Printing integer values

The assignment asks you to print integer values, which we haven't specifically discussed in class. There are two ways to do this:

5.1. Using printf

   1     printf("%d", number);

This is generally the better approach. The first argument to printf tells it how to format the remaining arguments; the particular code %d means to print an int formatted in decimal. For more details on printf and what you can do with it see KernighanRitchie.

5.2. Using putchar

If you happen to have a number in the range 0 through 9, you can print it using putchar and some arithmetic on ASCII character values:

   1     putchar('0' + number);

This is significantly faster than using printf since there is no need to parse a format string or check if the number is greater than 9. However, it may have bizarre results if the number is not in the range 0 through 9. Unless you are very worried about speed you should use printf.

6. Sample solution

   1 #include <stdio.h>
   2 #include <ctype.h>
   3 
   4 /* Run-length encoder for CS223 HW1 */
   5 
   6 /*
   7  * Encoding:
   8  *
   9  *   A repeating string of up to 9 characters is replaced by
  10  *   a digit followed by the character.
  11  *
  12  *   A repeating string of more than 9 characters is broken up into
  13  *   9-character blocks, e.g.
  14  *    
  15  *     xxxxxxxxxxxxxx -> 9x5x
  16  *
  17  *   Single non-digit characters are not replaced.
  18  *
  19  *   A single digit is replaced by 1 followed by the digit.
  20  */
  21 
  22 #define NO_PREVIOUS_CHARACTER (EOF)   /* initial dummy value for accumulator */
  23 #define MAX_COUNT (9)                 /* largest count we can emit */
  24 
  25 int
  26 main(int argc, char **argv)
  27 {
  28     int current;        /* character we are accumulating */
  29     int count;          /* how many of them we have so far */
  30     int next;           /* next character read */
  31 
  32     /* Basic idea: keep track of the last character we saw and how many
  33      * of them we have seen so far.  When we see EOF or a different character,
  34      * or we hit the maximum count, spit out the appropriate representation
  35      * of what we've accumulated. */
  36     /* NO_PREVIOUS_CHARACTER acts as a default value we will never see */
  37     current = NO_PREVIOUS_CHARACTER;
  38     count = 0;
  39 
  40     do {
  41         next = getchar();
  42         if(next == EOF || next != current || count == MAX_COUNT) {
  43             /* spit out the accumulated characters */
  44             switch(count) {
  45             case 0:
  46                 /* nothing to emit */
  47                 break;
  48             case 1:
  49                 /* prefix with 1 if it is a digit */
  50                 if(isdigit(current)) {
  51                     putchar('1');
  52                 }
  53                 putchar(current);
  54                 break;
  55             default:
  56                 /* print count followed by character */
  57                 printf("%d", count);
  58                 putchar(current);
  59                 break;
  60             }
  61             /* reset the accumulator */
  62             current = next;
  63             count = 1;
  64         } else {
  65             /* still accumulating */
  66             count = count + 1;
  67         }
  68     } while(next != EOF);
  69 
  70     return 0;
  71 }
rle.c
  1. Ulterior motive: The information in your email will be used to create an account for you in Grade-o-Matic. (1)


2014-06-17 11:58