1. Bureaucratic part
This part you will not be graded on, but you should do it anyway.
Send me email. My address is <aspnes@cs.yale.edu>. In your message, include:
- Your name.
- Your status: whether you are an undergraduate, grad student, auditor, etc.
- Anything else you'd like to say.
2. Technical part
This part you will be graded on.
Steganography is the technique of hiding a message inside another, seemingly-innocuous message. For this assignment, your are to implement a program that extracts messages hidden in texts as described below.
2.1. The encoding
To encode a message, each character is expanded into a sequence of eight bits representing its ASCII code, with most-significant bit first. For example, the character 'R' (ASCII code 82) would be represented as the bit string 01010010. Each bit is then encoded by mapping each one to one of the first ten letters of the alphabet ('a' through 'j' or 'A' through 'J') and each 0 to some other character. This gives the first 8 characters of a line of text that is then used to encode the original character (any characters after the first 8 are ignored). So, for example, 'R' might be encoded by
Misinterpretation of this important message is encouraged!
A new line is started for each character in the original plaintext. So
Of more delight than hawks and horses be; The more I hear and see just cause of hate? That they behold, and see not what they see? The dedicated words which writers use The dear respose for limbs with travel tir'd;
encodes the bit sequences
01000010 01100001 01100011 01101111 01101110
which represent the ASCII codes for the characters Bacon, a notorious Elizabethan-era steganographer.
2.2. Your task
Write a program decode.c that reads a hidden message encoded as above from stdin, and writes the message to stdout.
2.3. Details
Your program should handle lines of arbitrary length (including lines that are too short). Missing characters should be treated as zeros, so that
We do think about things we do think about.
decodes to PchPc.
- If end-of-file occurs before reaching a newline, you should not output anything for that line, even if the line has 8 or more characters.
Your program should be able to handle arbitrary characters in its input and output, whether printable or not. Sending non-printable characters directly to your terminal window can have odd effects. It's safer to either redirect the output to a file (./decode < input-file > output-file) and look at the result in a text editor; or pipe the output to a program that will convert it to a more readable format (./decode < input-file | od -t x1).
2.4. Extra hints added 2012-01-18
If you want to compute the integer value 2x where x is an integer, don't use the pow() function from the math library. Use (1<<x) instead. The problem with pow() is that it returns a floating-point value that may not turn out to be exactly 2x when converted back to an integer type (although this should not be an issue when x is small). The more practical problem is that the test script compiles your program without linking in the math library, so you will get errors about the linker not being able to find pow even if you remember to include <math.h>. (If you do want to link the math library, you need to add an extra argument to gcc after your source file, as in gcc -o program program.c -lm.)
Watch out for truncating the return value of getchar. The getchar routine returns an int, which may take any of the values 0..255 or the special value -1. If you assign this to a variable declared as a char, all but the last 8 bits will be thrown away. This makes the perfectly legitimate eight-bit character '\xff' look exactly like -1, which is EOF. Since the random.in test file contains many '\xff' characters, you may find yourself with a much shorter output than expected on this test.
One of the tests in test.decode prints the command line gcc -std=c99 -o decode decode.c || echo compile failed. This is the test script telling you what command it is running, which tries to run gcc and executes the command echo compiled failed (which prints compiled failed) only if gcc returns a non-zero exit code (this is because of the ||, which acts the same way on the command line as it does in C). As long as there is an OK in the left-hand column, this scary-looking string does not mean that your compile actually failed.
3. Submitting your assignment
To submit your assignment, use the program /c/cs223/bin/submit on any machine in the Zoo, like this:
$ /c/cs223/bin/submit 1 decode.c Copying decode.c $
Here the 1 is the assignment number and decode.c is the name of your source file (which must be exactly decode.c). There is no limit to how many times you can submit your assignment, but only the last version submitted will be graded. Note that submitting your assignment after the deadline will incur a lateness penalty.
You can check that the file was copied correctly by running /c/cs223/bin/check. Additional tools for dealing with submissions are described in HowToUseTheComputingFacilities.
4. Testing your assignment
A public test script is available on the Zoo in /c/cs223/Hwk1/test.decode. You can run this with no arguments in a directory containing a file decode.c, and it will compile your program and run various cryptic tests on it. The tests used for grading may be different, so the score you get from test.decode may not accurately predict your final grade.
If you want to see the inputs for each test and the corresponding expected outputs, look in /c/cs223/Hwk1/testfiles. Many of these files contain non-printing characters, so looking at them with cat is not recommended.
Note that test.decode first tries compiling your program with the line gcc -Wall -std=c99 -pedantic -o decode decode.c, which turns on all of gcc's warnings about errors or use of non-standard C extensions (e.g., features from C++, which gcc will often permit by default). You should make sure that your program compiles without warnings with these flags for full credit.
After submitting your assignment, you can run the test.decode script on the submitted files with the command
/c/cs223/bin/testit 1 decode
This can be useful for making sure you have in fact submitted everything you were supposed to.
5. Sample solution
1 /*
2 * Decode steganographic messages
3 *
4 * For each line, interpret characters a..j and A..J as 1 bits
5 * all others as 0 bits,
6 * representing a single character, most significant bit first.
7 *
8 */
9
10 #include <stdio.h>
11
12 #define MOST_SIGNIFICANT_BIT (0x80) /* value of most significant bit in an 8-bit char */
13
14 int
15 main(int argc, char **argv)
16 {
17 int c;
18 int output = 0;
19 int mask = MOST_SIGNIFICANT_BIT;
20
21 while((c = getchar()) != EOF) {
22 if(c == '\n') {
23 /* flush and reset output */
24 putchar(output);
25 output = 0;
26 mask = MOST_SIGNIFICANT_BIT;
27 } else if((('a' <= c) && (c <= 'j')) || (('A' <= c) && (c <= 'J'))) {
28 /* one bit */
29 output += mask;
30 mask >>= 1;
31 } else {
32 /* zero bit */
33 mask >>= 1;
34 }
35 }
36
37 return 0;
38 }