C Review - levbrie/self GitHub Wiki

This repository contains the recitation notes for Columbia's Advanced Programming Class, COMSW3157, as taught by Jae Woo Lee. For more information about the class, visit the course homepage.

These recitations are held weekly by the various TAs, generally using these notes as the basis for their sections. Issues, patches, and comments, especially by current and former students, are welcome.

Contents:

  • Recitation 1: Introduction to UNIX, compile your first C program. Be sure to have a CLIC account from CRF before the recitation.
  • Recitation 2: Makefiles. Bitwise operations. Configuring and using Git for version control and file tracking.
  • Recitation 3: C Basics: data types, complex expressions and statements.
  • Recitation 4: Pointers, arrays, and heap memory (i.e. malloc).
    • See also the very useful recitation-6-code directory, especially pointerfun.c and its output pointerfun-output for some interesting experiments with pointers and memory addresses.
  • Recitation 5: Function pointers. Taking Makefiles to the next level. File input/output.
  • Recitation 6: The UNIX stack. Users, permissions, file attributes. Processes, forking, and signals.
    • See also jsh, the Jae shell, in the recitation-6-code directory for a fun example of forking.
  • Recitation 7: size_t. File IO, including reading, writing, and seeking in files.
  • Recitation 8: Midterm(s) review.
  • Recitation 9: C++. Classes/Structs. Stack vs heap allocation in C++. The basic 4. Implicit conversion and operator overloading.
  • Recitation 10: C++ templates and containers
  • Recitation 11: Smart Pointers

RECITATION 1

Back to Contents

Logging in

Logging in is fairly simple if you're in the clic lab. Just use your UNI and password on any of the machines when prompted. You may have to back out once to be able to enter your UNI.

Remember that your password isn't necessarily the same as your UNI password, its the one you used when creating your clic account.

If you're on a mac:

  • Open up Terminal
  • type ssh [email protected] You will be prompted for your clic password. Enter it.
  • NOTE: if you want to enable X11 forwarding (this will allow you to share graphical programs as well as the shell with the remote machine) append the X (must be capital) flag to your ssh command. ssh [email protected] -X
  • You will now be in a remote shell session on a random machine in the clic cluster.

If you're on Windows

  • Download and install either Putty or MobaXterm
  • Each program has a fairly simple to use GUI for connecting, so the relevant information is just:
    • Username: your_uni
    • Password: your clic password
    • Remote server: clic-lab.cs.columbia.edu
  • Click connect and you'll be in a remote shell session on a random machine in the clic cluster.

Also note, if you ever need to connect to a specific machine in the clic cluster (this will be necessary if you want multiple sessions on the same machine) just use ssh [email protected]

Basic UNIX

Paths

A good place to start with UNIX is the filesystem structure. Our clic machines run Ubuntu linux. Unlike Windows machines, UNIX uses forward slashes to denote the break between directories and files. An example file path in UNIX might be /usr/bin/dict. The path / by itself denotes the highest level directory. If you start any path with a forward slash, it will assumed to be a path relative to the root directory /. Alternatively, your current directory is represented by the notation ./. In most cases you can leave this off. You can find out your current directory using the command pwd. If you're current directory is /usr/bin and you wish to reference the directory /usr in a path, you can do so using either /usr (this is known as an absolute path), or you can use ../ (this is a relative path. ../ denotes the directory above the current directory. You could reference the root directory / from the directory /usr/bin using the relative path ../../.

Basic Navigation

When you log into clic, your current (working) directory will be /home/your_uni. Check this by typing:

pwd

This is what's known as your home directory. You can do pretty much whatever you want to the files in this directory. You own the place. Let's make a new directory here for your work in this class. Use the mkdir command to make a new directory:

mkdir cs3157
mkdir cs3157/learning

Now let's move into your working directory. Use the change directory command, cd. All three commands below will do the same thing (which ones are the relative paths?)

cd cs3157/learning
cd ./cs3157/learning
cd /home/your_uni/cs3157/learning

Let's create a new text file using the touch command. This will create a new empty file if one does not exist, or update the last modified date if a file exists.

touch testing.txt
touch .hidden.txt

Let's see if it worked. The ls command lists all files in the current directory. Some commands in UNIX take flags. These are special arguments preceded by a dash. Usually it will make the most sense to attach the -l and the -a flags to our ls calls. -l will include the permissions of each file (more on this later) and -a will include files that are hidden (in UNIX, these are files that start with a .)

ls
ls -l
ls -la //we can combine flags for ls

Notice the differences in output each time the command is run. Now let's delete our hidden file using the remove command, rm.

rm .hidden.txt
ls -la

Alright, now lets get rid of the directory we created here. rm accepts flags too and won't let you delete directories unless you specify to run it recursively (repeatedly go into each subdirectory and delete all files. It will also ask for confirmation that you want to delete files unless you tell it to force the delete. For force delete, use -f and for recursive use -r:

cd ../
rm learning -rf
ls -la

One last thing as a side note: if you ever don't know how to use a command in UNIX (or even a function in the standard C library) you can use the man command. It will bring up the manual pages for the command you ask it for. Try to learn more about the ls command:

man ls

Use q to quit out of a man page.

Useful Tricks

Tab Complete

When in bash, try to make use of tab-complete as often as possible. This just means pressing tab after typing the first few letters of a command. For example typing tou followed by the tab key will complete to touch. Tab complete also works for directory and file names.

Previous command

You can go through your history by pressing the up and down arrows in terminal. This will navigate between previously used commands so that you can easily use the same commands over and over.

back i search

For long complicated commands that you only use every so often you can use the reverse search to locate them in your history. Pressing ctrl r in terminal will bring up a backward search through your history. Start typing until you find the command you're looking for.

Text Editors

There are two main text editors that you can use from inside terminal: emacs and vim. Which you use will ultimately be your decision (you could even write everything in pico if you really wanted, but this would be difficult).

Vim

Vim is a difficult to use text editor and very confusing at first. It's goal is to be incredibly efficient by preventing unnecessary movement of your hands around the keyboard. It operates in different modes, the most important of which will be edit mode and insert mode.

Open vim, editing a new file "vimtest".

vim vimtest

When you open vim, it will be in Normal mode. Typing will cause a variety of different operations to happen. For now switch to Insert Mode by pressing i. You should see -- INSERT -- appear at the bottom of the screen. At this point anything you type will appear as text in the text file. This is fine for basic editing. Now let's switch back to Normal mode. Press the esc key to switch back.

Most vim commands execute as soon as you type them. Here are some basic commands that will execute immediately:

  • h j k and l are how you move while in Normal mode. They are, respectively, left, down, up, right. Notice that this will save you time in moving to the arrows keys.
  • dd will delete the current line
  • D will delete from the current location to the end of the line
  • yy will copy the current line
  • p will paste whatever is the buffer (kind of like a clipboard)
  • 0 jumps to the beginning of the line
  • $ jumps to the end of the current line
  • w jumps to the beginning of the next word (W uses a broader definition of word)
  • b jumps to the end of the previous word (likewise B)
  • u undoes the last change
  • Ctl-r redoes the last change

Some vim commands will not be executed until you press enter. These begin with a colon.

  • :w [optional filename] This will save the current file if no file name is passed or write the current file to specified location.
  • :x This will save and quit
  • :e filename will open the filename specified
  • :q will quit vim and take you back to terminal
  • :[line numer] will jump to that line

That should be enough for basic vim navigation. If you want to learn to be a real vim ninja, get used to switching between modes first. Then try to expand your Normal mode vocabulary one command at a time.

In vim, most commands work with some sort of combination between prepositions and actions. For example, gg=G would indent the entire file, as gg takes you to the beginning of the file, = auto-indents a line, and G jumps to the end of the file.

You can also use vimtutor to really learn the ins and outs. In terminal, just type

vimtutor

Alternatively, check out Open Vim's Tutorial for another interactive vim lesson. Or play a little NES Zelda type game while learning vim, with Vim Adventures.

After learning vim, you might want to configure it. This is done by editing the .vimrc file in your home directory. Let's check out our current settings.

vim ~/.vimrc

There should be default settings there already, but you can look to make changes here in the future. Jae and the TAs will send out their configurations later, but a good starting point is to enable line numbers by adding this line to your vimrc:

set number

Emacs

May be skipped for time

Emacs is an easier to pick up text editor but has less efficient keyboard shortcuts compared to vim.

Let's start by editing a new file in emacs

emacs emacstest

As soon as emacs starts running, you will be able to type into it. There is no special insert mode like in vim. You can backspace at any time without having to switch between modes.

Emacs has much of the functionality that vim has and we present the basics below:

  • 'Ctrl-f' will move your cursor forward, 'Ctrl-b' will move it back, 'Ctrl-p' will move it up, 'Ctrl-n' will move it down
  • 'Ctrl-k' will delete the current line
  • 'Ctrl-s' will search for a word forward, 'Ctrl-r' will search for a word backward
  • 'Ctrl-a' goes to beginning of line, 'Ctrl-e' goes to end
  • 'Ctrl-spacebar' to select text to manipulate
  • 'Esc-w' to copy text, 'Ctrl-w' to cut text, Ctrl-y' will paste your most recently copied/deleted text
  • 'Esc-g g' then enter a line number to jump to a particular line in the buffer

To exit and save we will use Ctrl-X + Ctrl-C. If you just want to save then use Ctrl-X + Ctrl-S.

Just like vim, emacs also has a configuration file that you can edit. This is .emacs file within your home directory. Let's check out our emacs settings.

emacs ~/.emacs

There should be default settings there already, but feel free to add more for shortcuts.

*Note that backspaces can be a little funky when ssh-ing into CLIC and your *backspace button might actually be sending "Ctrl + H" instead! To fix this you *will have to add the following lines to your .emacs file.

;; make sure backspace deletes backwards
(normal-erase-is-backspace-mode 1)
;; make sure your backspace is mapped correctly
(global-set-key "\C-h" 'backward-delete-char)

Compiling and linking a C Program

There are many steps to compiling a program in C. They occur in the following order:

  1. Pre-processing: This is when the compiler processes lines that start with a hash-mark (#), most significantly #include.
  2. Compiling: This converts a source code file (foo.c) into an object file (foo.o) which contains a system dependent compiled representation of the program as described in the source file. This code may contain symbols (like variables and function names) that are not defined in the individual source files.
  3. Linking: This step links code in various object files together, linking up the pieces that are required in all the .o files. This will produce an executable file.

We'll be using gcc to compile our programs. gcc is a shell command that accepts a few parameters that we'll be making use of often.

  • -c [files] This will compile a list of .c files into .o files without going through the linking stage. This helps separate compilation from linking, and is very useful for Makefiles.
  • -o [file] specifies what gcc's output should be. If none is specified this will default to either: foo.o when compiling foo.c; or a.out when linking to create an executable.

Additionally, you should always use two optional flags to make gcc more helpful:

  • -g This flag will include debugging information when you compile and link. Debugging information helps you when you use tools like valgrind and gdb, for example by indicating line numbers in the source code.
  • -Wall This will turn on all compiler warnings. Warnings are likely problems with your code, but they aren't so severe as to be errors that mean it won't run at all. These can be small problems now that cause big crashes later, so it's best to turn this on when compiling and fix all warnings.

Let's take a look at this process in an actual program: myadd. We'll create a basic program to add two numbers, using a custom addition function.

Let's begin with our main function:

main.c

#include <stdio.h>
#include "myadd.h"

int main(int argc, char **argv) {
  printf("The sum is: %d \n", add(1, 2));
  return 0;
}

myadd.h

#ifndef __MYADD_H__
#define __MYADD_H__
int add(int a, int b);
#endif

Now let's try to compile myadd. First we'll build the object file for main.c. Notice the compiler directive #include. This tells the compiler to just copy paste the specified file into the current file at that location. The reason we include this line in main.c is so that if we reference a function in either of these files before it is defined, the compiler can know its header.

As an example, in main.c we have add(1, 2);. The compiler wants to make sure that this is a valid function call, but knows nothing of the function "add", what type it will return, or what its explicit parameters are. Including myadd.h will tell the compiler that "add" returns type int, and accepts two integer parameters.

Let's compile main.c (using our standard -g -Wall options):

gcc -g -Wall -c main.c
ls

You should see that you now have a main.o in your directory. There was one other set of directives that we've used now. The #ifndef #define and #endif directives. The first and the last define a block of code that should only be executed if a pre-processor variable is not defined. This will prevent multiple header files from conflicting. If myadd.h is included more than once, the first time the pre-processor will define _MYADD_H and each time thereafter will skip over the entire file.

Pause here for a moment, and think: What haven't we done yet?

We haven't even written the add() function yet. It's nowhere at all. However, gcc let us compile main.c without even giving us a warning! That's because we included myadd.h, which gives a prototype for the add function, so gcc knows that the function call in main() is valid. That's all the compiler needs, it doesn't care how add() works, just that it will exist and is being validly used.

However if we try to link it into an executable, we get an error because during linking, it actually needs the code for add():

gcc -g main.o -o main

So let's go ahead and write myadd.c. We need to include myadd.h in myadd.c, although it doesn't seem (and isn't strictly) necessary in this simple example. In general including the relevant header file helps the compiler catch any mistakes (for example, if you change a function to return a long instead of an int, but don't fix it everywhere), and makes sure that every function in the file knows about every other function. So always include the relevant header file in the corresponding c file.

myadd.c

#include "myadd.h"
int add(int x, int y)
{
    return x + y;
}

Now let's compile myadd.c

gcc -g -Wall -c myadd.c
ls

And then finally link our two object files into the executable main.

gcc -g myadd.o main.o -o main
ls

You should now have an executable file in your directory main. Calling ./main will run your program. In this scenario, you must use ./ to note to the shell that you want to execute the program main in the current directory. Otherwise it will go looking in all the places it searches for system programs like ls and touch to find main.

For more on compiling, linking, and debugging, see this article.

In this class you'll be automating compilation and linking with Make, which is described in the next recitation.

RECITATION 2

Back to Contents

Makefiles

Make is a UNIX utility that follows a blueprint you create for compiling programs. Calling make will automatically search your current directory for a file called "Makefile" and use it to call various compiler commands according to the rules outlined therein.

Jae's myadd Makefile

Take Jae's Makefile piece by piece. It can be found in this git repository as sample-makefile

CC  = gcc
CXX = g++

Make has a some pre-configured rules for how to compile programs. For example itknows how to specify files as arguments to a compiler. However, you should tell it what compiler to use for C files and C++ files. Here, we set the special make variables CC and CXX to gcc, the C-compiler, and g++, the c++ compiler.

INCLUDES =

CFLAGS   = -g -Wall $(INCLUDES)
CXXFLAGS = -g -Wall $(INCLUDES)

Here we define our own variable, INCLUDES, which we can use for directories thatwe wish to include at the compilation step. An example value for INCLUDES couldbe -I/home/jae/cs3157 which would tell the compiler to look in /home/jae/cs3157 during the compilation step for missing header files and other sorts of relevant files.

After defining INCLUDES, we define the flags that we want each compiler to be run with. In this case we include the -g flag for debugging and -Wall flag to display all warnings. Lastly, we reference our variable INCLUDES to add those flags as well.

LDFLAGS = -g

LDFLAGS are the flags that are appended to the compiler when using it for linking. In this case we just want the debugging info to be included.

LDLIBS =

LDLIBS will automatically be appended to the linker commands. These are flags like -lm and function similarly to our INCLUDES variable but are added at a different step. m denotes the math library.

That's about it for our variable declarations. The next step is to define compile order and dependencies. The very first "target" or rule in your makefile gets built when you type make in this case the first target is:

main: main.o myadd.o

Note that we did not specify the linking rule, because make follows an implied linking rule:

$(CC) $(LDFLAGS) <all-dependent-.o-files> $(LDLIBS)

Also note that make assumes that main depends on main.o, so we could omit it:

main: myadd.o

Basically what this rule says is make should produce an executable called "main" by linking myadd.o and main.o. This declares main.o and myadd.o as dependencies of main, meaning that if any of the dependencies (or their dependencies) change between the last time this target was run, it should re-run the outdated targets as well as this one.

The next target we declare is main.o:

main.o: main.c myadd.h

This says that main.o depends on main.c (assumed) as well as myadd.h. See last week's recitation notes to understand why main.o depends on myadd.h. We could omit main.c as follows:

main.o: myadd.h

Either way, we do not specify a rule here because make assumes the implicit rule:

$(CC) -c $(CFLAGS) <the-.c-file>

Lastly, we specify the target for myadd.o:

myadd.o: myadd.c myadd.h

We'll include two phony targets. We tell make that they're "phony" so that it doesn't attempt to use implicit rules or try to compile them. The first target we make is "clean" which should remove all intermediate files. Always include a clean so that make clean can be used to remove intermediate files like object files, compiled code, etc. This should return your directory to just its source code that can generate all the other files. Be careful: Using rm -f will not prompt you to remove files. This is customary for make clean but it also means if you make a mistake in designing your rule it could remove files that youdidn't want to. There is no "trash" in UNIX - they'll be gone forever.

Lastly, we define a phony "all" target that just depends on the main and clean targets. This will always remove all intermediate files and compiled files, forcing make to recompile everything when main is called.

.PHONY: clean
clean:
        rm -f *.o a.out core main

.PHONY: all
all: clean main

Git

For this part of the recitation, we will follow Jae's "git-tutorial" which can be found on the mailing list and CourseWorks. Here's a quick run through of the operations reviewed in the tutorial, along with some bonus operations:

Configuration:

git config --global user.name "Your Full Name"
git config --global user.email [email protected]
git config --global --add color.ui true

And set your editor globally (here vim, if you prefer emacs use that). Note that using the graphical version, gvim, is trickier, so we recommend you stick to the command line version.

echo "EDITOR=vim" >> ~/.bashrc

Getting Started

git init
git clone remote

Working with repositories

git status
git add file1 file2
git add -p    #individually pick for each set of changes whether to stage it
git commit
git commit -m "some message"

Checking up on your changes

git status
git diff
git diff file1
git diff --cached
git log
git log --stat --summary
git log -p

Removing and renaming

git rm file1
git mv oldfilename newfilename

Undoing

git checkout -- [filename]
git reset HEAD [filename]

Going back in time

git checkout <commit hash>

Tools

git grep [pattern]
git help
git help commit
man git
man gittutorial

Patches are rarely necessary, but this is how the submit script works

git format-patch --stdout origin > mywork.mbox
git am path/to/mywork.mbox

Remotes

git remote add
git pull
git fetch && git merge
git push

Important: statuses of files

  1. Untracked
  2. Tracked, unmodified
  3. Tracked, modified, unstaged
  4. Tracked, modified, staged

gitignore

Git wants to track everything, but you don't want it to track everything. In particular you want to ignore all your object .o files, and your compiled executables. You can tell git to ignore certain files by using a gitignore file. It's a list of file names (including wildcards) for git to ignore when running commands like status.

In your repository create a file named .gitignore, where each line is a pattern of filenames git should ignore.

a.out
*.o
*.a
main
*.mbox
/labN-2013*

You can add the .gitignore file itself to the .gitignore, or you can add it to the repository. You may also create a global ignore file so you don't have to copy it to each repository. More details about that are in Github's help on ignoring files.

Bonus

All of these recitation notes are tracked using git and hosted on github. If we have time we'll come back to this during recitation, but here's some github 101.

  1. Create an account by going to github.com and signing up. Then, configure git for use with remote servers.

  2. Add your SSH keys to github. They have a handy tutorital to help out. All you should need to do is Step 4 - adding ssh keys.

  3. Try forking this repository. Pull your fork to your local machine.

    Digression: One of the reasons git is so great for working in distributed teams is a feature called branching. Branches are subsections of git commits that don't affect other branches. For example "master" is the branch that you'll do all your work on for this class. Let's say though you want to add more unix commands to recitation-1.md. You could create a branch called improve_recitation1_unix like so:

     git checkout -b improve_recitation1_unix
    

    This would create a new branch, and switch to it. On this branch you would make and commit your changes. When finished, you could switch back to the master branch and merge your changes from the feature branch as follows:

     git checkout master
     git merge improve_recitation1_unix
    

    The reason branching is so useful is that it allows for multiple people to work on their own issues, and then merge their changes in only after they are certain their changes will not cause problems to the master branch. In this way, the master branch always represents a completely functioning project, while the branches may have broken code.

    Anyway, all this was a bit of a digression to discuss branching, but now that you have a fork of my respository, you can make changes on the master branch. When you're done, use git push origin master to push your changes back up to your fork, and then go to github.com to pull-request your changes. If I like what you've done, I'll definitely accept your pull request.

And that's about it for github. Forking and branching are crucial to working on teams, both private and open-source. Github and git are great tools for managing all sorts of things, even notes, so make sure you're familiar with them. Proficiency in git and github is a desirable trait to have when job-hunting.

Other useful tutorials:

Bits, Bytes and Binary

Let's just refresh our memory about memory:

  • A bit is a single digit in binary; on or off; 1 or 0
  • 8 bits form a single byte: 11111111 = 2^8 - 1 = 255
  • Hexadecimal is another notation to count even higher in fewer places
    • Two hexadecimal places express 1 byte
    • FF in Hexadecimal is 255
  • Two's complement
    • Most modern computers use this notation for signed integers
    • Most significant bit: Usually the leftmost, but generally the bit with the highest value: If 1000 = 8, then 1 is the most significant bit. If we were using a different notation such as 1010 = 5, then the rightmost 0 is the most significant bit.
    • If the most significant bit is 1, then in two's complement, you're looking at a negative number
    • To convert: 1010 (read as ten if unsigned), first note that it is negative. Then find its magnitude by flipping all the bits (0101, 5) and then adding 1 (0110) meaning the value is -6.
    • Consult the following table to see something interesting: (note the wraparound effect
Binary usigned decimal two's complement decimal
000 0 0
001 1 1
010 2 2
011 3 3
100 4 -4
101 5 -3
110 6 -2
111 7 -1

Be aware of some important boundaries as well:

  • 0x00000000 = 0
  • 0x7FFFFFFF = 2147483647
  • 0x80000000 = -2147483648
  • 0xFFFFFFFF = -1

Bitwise Operators

While we're on the subject of binary representation, let's take a moment to examine C's bitwise operators. They're a bit tricky, but perform extremely fast low level operations and learning them well now will help you with more complex concepts later on.

Bitwise AND

The bitwise AND operator, &, takes two integers as operands and returns a new integer with a bit pattern consisting of ones only in the positions that both operands contain bits set to 1.

int x = 5;  // 0101 in binary
int y = 12; // 1100 in binary
x & y;      // 0100 (4)

Note: x & y == 4, but x && y == 1. Can you explain why?

This provides a handy way of checking the bit value at a given position in a number:

int mask = 8; // 1000 in binary, for checking the 4th bit
x & mask;     // 0, since 5 doesn't contain a 1 in the 4th bit
y & mask;     // 1000 == 8 > 0, since 12 contains a 1 in the 4th bit

Bitwise OR

The bitwise OR, |, behaves like the bitwise AND but the returned integer's bit pattern consists of ones where either operand has a 1.

int x = 5;  // 0101 in binary
int y = 12; // 1100 in binary
x | y;      // 1101 (13)

Bitwise XOR and Complement

The bitwise XOR, ^, sets 1 in each bit where its operands differ and 0 where they are the same. The bitwise complement, ~, performs the one's complement on its operatand by flipping each bit.

Bit Shifting

The bitwise shift operators, << and >>, shift their left operand by the number of digits specified by the right operand. Left shifting always fills vacated bits by zero. Right shifting varies from machine to machine and whether or not we're talking unsigned or signed.

int x = 2;  // 000010 in binary
x << 2;     // 001000 (8)
8 >> 1;     // 000100 (4)

RECITATION 3

Back to Contents

Data Types: Numbers

Integer Types:

  • char
  • int

Modifiers (and sugar):

  • short
  • long

Memory size for each type depends on system, and only restrictions are that

char <= short <= int <= long <= long long

Clic machines follow

  • char = 1 byte
  • short = 2 bytes
  • int = 4 bytes
  • long = 8 (these last two vary from system to system a lot)
  • long long = 8

Test it out for yourself:

#include <stdio.h>

 int main(int argc, char** argv) {
   printf("char: %d\n
     short: %d\n
     int: %d\n
     long: %d\n
     longlong: %d\n",
     sizeof(char), sizeof(short), sizeof(int), sizeof(long), sizeof(long long));
   return 0;
}

Here are some declarations to help you understand what really happens when we're talking characters and integers. Definitely take a look at The Ascii Table and understand the relationships in theorder of the first 128 ascii characters. The C language is built on a subset of 7-bit ascii (0-127) so its important to know what the table is like, not to memorize it. Also note that in C, single quotes means a character.

Declaration x (dec) y (dec)
int x; NULL -
int x, y; NULL NULL
int x = 0, y; 0 NULL
char x = 'x'; 120 -
char x = '\n'; 10 -
char x = '\13'; 11 -
char x = '0'; 48 -
char x = '\0'; 0 -
char x = 0; 0 -
long x = 0L; 0 -

Preceding a constant with 0x denotes hexadecimal notation:

(0xFFFFFFFF == -1); //returns 1 (which is true, but C doesn't have true)
(037777777777 == -1); //returns 1 (true)
sizeof(1234L); //returns 8
sizeof(1234); //returns 4
0xFFFFU; //returns 65535
0177777U; //returns 65535

Also not that converting a signed value to an unsigned value or vice verse preserves the bit pattern:

  char c = -1; //0xFF
  unsigned char uc = c; //0xFF
  int i = uc; //i == 255

Float and double are the two floating point types (decimal) and can be expressedwith a decimal point or as scientific notation:

float miles = 1.8;
double big = 1e10;

The only implementation constraint is that

float <= double <= long double

so they could all be one size, or be three distinct sizes.

In C there is no such thing as a string, just an array and pointers. Essentially, a bunch of single characters located consecutively in memory will make up a string, but more on this later.

Expressions

char *myString = "Here's a string!" //string literal
int x = 10; //variable declaration and assignment
int x; //variable declaration
x = 30; //variable assignment: lvalue = rvalue

The order of the increment/decrement operators in C matters:

int x = 1;
int y = x++; // y==1
y = ++x; // y==3

Valid binary operators in order of increasing precedence:

    • / %

The positive and negative operators are more tightly binding than any of the above operators (+/-).

Comparison operators:

  • < > <= >=

All of these comparison operators have the same precedence, and are more tightly binding than the equality and inequality operators (==/!==).

The logical operators are || for or, && for and, and ! for not. The "or" and "and" operators short-circuit.

Bitwise operators are tricky and can be used for a variety of purposes. See the examples in the recitation 2 notes for a refresher.

Ternary operator:

x = a ? b : c;
//or
if(x)
  b;
else
  c;

Note that any integer is also a boolean!! 0 is false, any other number is true!

Statements

If-Else Statements

if(condition)
  if(condition2) {
    printf("conditions met");
    return 0;
  }
  else
    printf("no conditions met");

Switch Statements

switch(v)
  case 1: // v == 1
    printf("v is 1");
    break;
  case 2:
    printf("v is 2");
    break;
  default:
    printf("v is neither 1 nor 2");
end

Loops

int i;
for(i = 0; 1 < 10; i++) {
    // do things here
}
while(i) { //condition checked at the beginning
    //other things
    i--;
}

do {
    //more things
} while(i < 5); //condition is checked at the end

Using break; inside a loop will break out of the innermost loop. Using continue; will stop executing the current iteration of the loop and skip to the next iteration. for(;;) is an idiom for an infinite loop. goto label will jump to a line beginning with label: . Be careful with gotos.

RECITATION 4

Back to Contents

Variables

Stack Variables

When you declare a variable in C, it is defined for the current scope and will be released at the end of the scope. If you redeclare a variable inside a scope within a scope (see below) you won't be able to change the outer variable.

int x;
x = 0;
//do things with x
{
    int x;
    x = 1;
    //x is now 1 within here
}
//x is still 0 out here

The above are automatic variables or stack variables. Their scope is local to a block (code enclosed by curly braces as shown above) - they are created when entering the block and destroyed upon exit.

Static Variables

Many different meanings depending on where you declare:

int global_static = 0;

static int file_static = 0;

int foo(int auto_1) {
    static int block_static = 0;
}

In general, global/static variables are created when the program runs and persist until the program ends. This means they will not be re-declared or re-initialized.

Global Variables

Global variables are like a special case of static variables. They are accessible from all files in the program, and if they are declared within the current file already you don't need to use the extern. See below:

In one file:

int global_static = 0;

int main() {
    int global_static;
    global_static++;
    magicPrint();
}

In another file:

void magicPrint() {
    extern int global_static;
    printf("%d", global_static); //prints 1
}

Address Space

Every process gets 512G of virtual memory space. The stack grows downward (see Jae's notes for a diagram) starting from 512G while the program code, static variables, and heap variables are all at the bottom (0). Basically, this means when functions are called, space for them is built up on the stack and cleared as they complete. Heap variables (you'll learn more about these later) will be allocated on the heap and therefore, like static variables, will not be cleared after each function call.

Pointers

Pointers are what will get you a job. Understanding pointers is crucial and using them naturally will make you stand out as a programmer. Let's start with the basics.

Pointer: A variable that stores a memory address. That's it.

Pointers can have types as well so you know what kind of address you're dealing with. They're pretty nifty. An asterisk denotes pointer types.

Here's an example:

int x = 5;
int *p; // integer pointer
int* p2; // also integer pointer
p = &x; // operator
*p == 5;

Note:

  • spacing of asterisk doesn't matter, but *p is generally preferable as it makes declarations clearer. int* p1, p2; would lead you to assume that both p1 and p2 are being declared as type int* (a pointer to an integer), but in reality the compiler interprets this statement as if it was written int *p1; int p2; - declaring p1 as a pointer to int and p2 as a normal int. Writing the declaration as int *p1, p2 will avoid confusion in such cases.
    • is also an operator that will dereference a pointer (get you its value)
  • & is an operator that will reference a value (get you its address)

Why use pointers?

C is a call-by-value language which means if you want a function to modify a value that you have, you'll have to tell the function where to find the memory, not just the value, otherwise C will just make a copy:

void increment(int x) {
  x++;
}
void actually_increment(int *x) {
  (*x)++
}
int main() {
  int x = 1;
  increment(x); // x is still 1
  actually_increment(&x); // x is now 2
}

Note not only the difference in the function, but how the parameters are passed. Passing a pointer is fundamentally a different type than passing a value

For more pointer examples, see recitation-4-code/basicpointers.c

Arrays (jk they're the same thing)

C has arrays, but they're basically just pointers. An array has a type, just like a pointer, and represents multiple instances of enough space for that type in contiguous memory.

int a[10];

This would allocate space for 10 integers in contiguous memory locations. Declaring multidimensial arrays is also possible.

int matrix[50][50];

Because arrays are just contiguous blocks of memory, you can access them using pointer arithmetic, or natural array notation:

int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
int *p = &a[0]; //p points to 0
*(p+5) == a[5]; //5 == 5
magic_function(a); (passes a pointer to the first element of a)
p+5 == a+5;
p++; //legal
a++; //illegal!

In most cases, just calling an array by its variable name without the square brackets will be interpreted as a pointer to the first element of the array. Incrementing a pointer will always move it ahead not one byte, but the size of the type to which it's pointing. In this case we're dealing with an int so it will likely move it ahead 4 bytes to the next int. Unlike a pointer, though, an array is a constant variable.

Strings in C

Strings in C are just a special case of arrays. C strings are arrays of characters with a null terminating character at the end.

char c[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
char *s = "my string"; // modifiable pointer
"my string"[0] == 'm' //true!

There's a slight difference between these two definitions. c is an array which means you can't move where it points to, its always going to point to a. s on the other hand can be incremented and decremented and moved around however you like. "my string", however, can't be modified; it's a string literal!

Some useful string functions (need to #include string.h):

char d[20];
char c[] = "abc";
strcpy(d, "123");
strcat(d, c);

printf("%s\n", d); //prints 123abc
printf("%lu\n", strlen(d)); //prints 6

//to only copy/cat the first n chars:
strncpy(d, "456", 2);
strncat(d, "def", 2);

printf("%s\n", d); //what does this print?

For a closer look at the strcpy function, see recitation-4-code/strcpy.c

So how about an array of strings? Well that would be an array of arrays.

char *a[] = {"hello", "world" };
char **p = a;
char a[][10] = {"hello", "world" }; //what's the difference here?

You can find more examples of this in recitation-4-code/ptrtoptrs.c

Heap allocations

Sometimes you need memory to persist across function calls (recall pseudo-pass-by-reference using pointers). Recall also that variables declared within a scope will be cleared once the scope's stack frame collapses. In other words, if you're trying to pass back a pointer to a variable that was declared within the function as a return value, it won't be there when you try to access it. To alleviate this you can allocate space on the heap using malloc.

(From Jae's notes)

int *p = (int *) malloc(100 * sizeof(int));
// malloc returns NULL if it cannot allocate the requested memory
if (p == NULL) {
  perror("malloc failed");
  exit(1);
}
// initialize all elements to 0
for (int i = 0; i < 100; i++)
  p[i] = 0;
// another way to do the same thing
memset(p, 0, 100 * sizeof(int));
//free() deallocates the memory block previously returned by malloc.
free(p);

Memory Errors

You'll be testing your code with valgrind for this class to make sure you don't have any memory errors in your code. This can include forgetting to free allocated memory, accessing memory that doesn't exist, etc. To run valgrind call:

valgrind --leak-check=yes ./your_executable

Recall from other classes that if valgrind doesn't return, it means your program isn't returning (this is a case of the halting problem). If your valgrind isn't giving you line numbers (and is giving you hex codes) then you're not compiling with the debugging flag -g.

The following are excerpts from recitation-4-code/invalidwrite.c and recitation-4-code/leak.c.

Valgrind+Makefile=Good

Remembering to run valgrind, and retyping the command, is annoying. A clever way to more easily run valgrind repeatedly as part of your normal edit/compile/test loop is to add valgrind to your makefile. Remember how you can include phony targets in your Makefile? We can use that to have it run Valgrind for us.

For example, using Jae's Makefile template from lecture note 1, you can add a stanza at the end:

.PHONY: valgrind
valgrind: main
    valgrind --leak-check=yes ./main

Then instead of running make followed by ./main you can just run make valgrind and it will compile your code and run it under valgrind.

Invalid Write

This is pretty easy to do but hard to catch in your code. Like many memory errors its usually caused by an off-by-one error. Imagine this:

int *p = (int *)malloc(sizeof(int));
// checking that malloc worked
if (p == NULL) {
  perror("malloc failed");
  exit(1);
}

You'll have a pointer, p, to one integer worth of space. Now imagine we move our pointer ahead one integer

p++;

p will now point past the space it was allocated. We know nothing about this space. It could be accessible, it could be protected. It could be someone else's variable that we're about to change. This is terrible. But let's mess around with it.

*p = 5;

What happens? You've got an invalid write. What about

int x = *p

Now you've got an invalid read. Valgrind will tell you about these and where they're happening. So long as you know what you're looking for you should be able to find it.

Memory leaks

Calling malloc without free-ing the memory you've allocated is awful. If you don't free space that's malloced when you're finished, it will persist even after your program exits. To correct for this, when you're finished, just call free() on the pointer to the memory that was malloced.

free(--p);

Now aside from those invalid read/writes, our program will run through valgrind pretty happily. The trick to memory leaks isn't just in free-ing that integer though. Imagine you've malloced space for an array of arrays, each of which was also malloc'ed. You'll have to go back through, freeing each individual array, and then when you're finished with that, free the higher order array.

Lab 2

Tips:

  • Test all your code with valgrind. Just do it.
  • Watch out for fence post errors when it comes to invalid read/writes. You're probably just one outside of your bounds
  • Watch out for being just inside your bounds on freeing. If you have a leak, its probably because you forgot to free one last element.
  • Don't forget that in C, strings are characters arrays followed by a null character. Without the null character, C has no idea where you string ends!
  • ALWAYS check the return value of malloc to make sure you were actually given allocated memory.
  • Name your executables properly. For part1, isort and for part2, twecho.

RECITATION 5

Back to Contents

lecture note 9, and useful information for the practice midterm.

Function Pointers (K&R 5.11)

It is often the case in programming that you'll want your program to change its behavior for the person using your program but have no idea how they may want to do this. For example, when writing a sort function, you could just specify a parameter that let's them choose between sorting ascending and descending. However, what if they wanted to sort characters by their unicode value instead of lexicographically? For this, the user has to supply their own functionality. That's where you need function pointers. Some higher level (read lambda-programming languages) support this is in a more intuitive way, but C does a pretty good job itself.

Accepting functions as a parameter to your function

Function pointers allow you to accept as a parameter to your function, another function. Let's start out with accepting functions as a parameter:

void notifier(int (*fn)()) {
  printf("Starting\n");
  fn();
  printf("Finished\n");
}

Let's examine this more closely. We have a function called notifier whose only parameter is another function. How do we do this? we specify what kind of function it accepts. The basic layout for a function pointer type is returntype (*functionName)(parameter1type, parameter2type, ...). Why? Well the returntype denotes the type of the function. The name of the function being preceded with an asterisk tells us its a pointer. The function is surrounded by parentheses so that the compiler doesn't think we've got a variable returntype *functionName. The parenthesis following the declaration are necessary as well, even if the function we want to accept doesn't have any arguments. The parameter's only need their types declared so that the compiler can check these. Since you won't have access to the code you don't need to worry about what to call them.

Passing a function as a parameter

int wasteTime() {
  int i = 0;
  while(i < 1000000)
    i++;
  return i;
}

int main(int argc, char **argv) {
  wasteTime();
  notifier(wasteTime);
  return 0;
}

Notice how we have a function wasteTime. And when we follow it with parentheses it gets called. When we don't, it is automatically a function pointer. Why is this?

Well everything in the program is stored in memory, even the functions, which means even they have an address. So wasteTime has an address in memory. When you follow wasteTime with parentheses (wasteTime()), C goes to the address and executes the function. If you think of wasteTime as storing a pointer to a function, then you can think of wasteTime() as dereferencing the pointer. Therefore, when we call notifier and pass it wasteTime without parentheses, it passes the address to wasteTime to the notifier function.

Calling functions from pointers

We could further use the functions above as follows:

int main(int argc, char **argv) {
  int (*f1)();
  char *(*f2)(char *, const char *);
  char arr[5] = "bann"

  f1 = wasteTime;
  f2 = strcpy;

  notifier(f1);
  f2(arr, "hihi");
}

As you can see from the above code, you can declare variables of "function pointer" type and assign function pointers to their values. Then you can call those functions simply by adding parentheses to the end of the variable name. Make sure to check out Jae's notes (lecture 7) for more complicated examples of this. Notice that we did the same thing when calling fn() in our notifier function.

Functions vs Pointers to Functions

Functions aren't quite like regular variables, so function pointers aren't quite like regular pointers either. Think about this small program:

void myFunc(int x) {
  x++; //this function is irrelevant...
}

int functionAcceptor(void (*f)(int), void (*g)(int)) {
  return f == g;
}

int main() {
  char *s = functionAcceptor(myFunc, &myFunc) ? "They're the same thing" :
    "They're totally different.";
  printf("%s", s);
}

What will this program print?

They're the same thing

Why? Because referencing a pointer to a function just gets you the same pointer back. It has no effect. Too bad. Boring.

Makefiles

Makefiles aren't as hard as they look so long as you understand the three steps to compilation. (Source here)

  1. Pre-processing: This is when special isntructions like #include are processed.
  2. Compilation: This is when your c code is processed into an object file in binary form. At this point, undefined symbols are okay so long as they're declared. That's why you #include header files - to declare functions and variables that you don't define in that particular file. gcc lets you stop the entire process at this point using the -c flag. Object files generated in this step can be linked together to form an executable or put in a special archive called a static library. Most error checking takes place at this stage.
  3. Linking: The linker takes object files with binary symbols and produces either a shared (AKA dynamic) library, or an executable. At this point it replaces all the references to undefined symbols with the proper memory address. Most often at this stage the only errors left come from double-defined symbols or undefined symbols.

Now let's take a look at the Makefile's default rules and variables:

Compilation (includes pre-processing):

n.o:
  $(CC) -c $(CPPFLAGS) $(CFLAGS) n.c

Linking:

n:
  $(CC) $(LDFLAGS) n.o $(LOADLIBES) $(LDLIBS)

Let's take a look at some of the variable definitions we've used in the past:

CC  = gcc
INCLUDES =
CFLAGS   = -g -Wall $(INCLUDES)
LDFLAGS = -g
LDLIBS =

And here are some entries from running man gcc:

gcc [-c|-S|-E] [-std=standard]
           [-g] [-pg] [-Olevel]
           [-Wwarn...] [-pedantic]
           [-Idir...] [-Ldir...]
           [-Dmacro[=defn]...] [-Umacro]
           [-foption...] [-mmachine-option...]
           [-o outfile] [@file] infile...
-g
           Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF 2).
           GDB can work with this debugging information.

           On most systems that use stabs format, -g enables use of extra debugging information that only GDB can
           use; this extra information makes debugging work better in GDB but will probably make other debuggers
           crash or refuse to read the program.

-Wall
           Turns on all optional warnings which are desirable for normal code.  At present this is -Wcomment,
           -Wtrigraphs, -Wmultichar and a warning about integer promotion causing a change of sign in "#if"
           expressions.  Note that many of the preprocessor's warnings are on by default and have no options to
           control them.

-I dir
           Add the directory dir to the list of directories to be searched for header files.  Directories named by -I
           are searched before the standard system include directories.  If the directory dir is a standard system
           include directory, the option is ignored to ensure that the default search order for system directories
           and the special treatment of system headers are not defeated .  If dir begins with "=", then the "=" will
           be replaced by the sysroot prefix; see --sysroot and -isysroot.

-Ldir
           Add directory dir to the list of directories to be searched for -l.

-llibrary
-l library
           Search the library named library when linking.

Given what we know about compilation, where might we want to use these flags and for what purposes? The order we use these flags and the variables we insert them in is very important.

Important Things to Review

  • Arrays, pointers, and the structure of argv. Understanding how these data types and structures work will be crucial to your understanding of C as this class progresses. Try practice problem 1 for an understanding of this.
  • Order of operations. This stuff will make the exam much easier if you're not trying to figure out whether the increment or the dereference happens first when there are no parentheses.
  • Structs and Unions. You've already had to use structs on numerous occassions. Though unions weren't on any labs, but are definitely in the lecture notes and therefore fair game.
  • Stack and Heap allocation. Make sure you know how to use malloc and free. Try to determine if there are memory leaks in some sample code.
  • libmylist.a. Make sure you understand how the linked list is supposed to function
  • Makefiles and git. These questions are always fair game as well. You've been using makefiles and git long enough that you should know what tracked, untracked, staged, and modified mean in the context of git. You should also know about what rules are implicit to make, and how to write a makefile with pen and paper.

Try to work on pen and paper as much as possible leading up to the midterm. Compile-and-check methods aren't going to get you far when you're sitting down to take the test, so don't use them to practice. Try to really understand what you're writing so that you can be confident it works on your own without the compiler checking it for you.

File I/O

3 channels

A C program is automatically given 3 channels for input and output. They can all be redirected, but the basic three streams are:

  • (0) stdin (Standard input) This stream is for incoming characters which normally come from the keyboard but can also be from other sources.
  • (1) stdout (Standard output) This stream is for outgoing characters, and normally goes to the terminal screen but does not necessarily have to. (see below) This stream is buffered which means it is not sent to the terminal until a new line character is sent. This means if you use printf("hello") you likely will not see it until the end of your program is reached.
  • (2) stderr (Standard error) This stream is for error messages and is not buffered, meaning any characters written to it will immediately be flushed to their destination. This destination is normally the terminal screen but can be other locations as well.

If you wish to interact with these buffers you will need to #include <stdio.h> which is a library that defines standard operations such as printf() scanf() and others which you may or may not have already used.

Redirecting I/O

While most input and output to/from programs will go to the shell, it is possible to redirect the source of stdin or the destination of stderr and stdout. The < and > characters are used to denote redirection at the console. 2> will redirect stderr whereas > will redirect stdout. 2>&1 will redirect stderr to the same location as stdout. >> will append the output to a file instead of overwriting the file. You can use other programs or files on either side of most of these operators.

 [1] $ cat myfile.c
 [2] $ cat < myfile.c
 [3] $ cat myfile.c > cat
 [4] $ cat myfile.c > myfilecopy.c
 [5] $ cat myfile.c >> myfilecopy.c
 [6] $ valgrind ./myprogram 2> myerrors
 [7] $ valgrind ./myprogram > myoutput
 [8] $ valgrind ./anotherprogram 2>> myerrors
 [9] $ valgrind ./anotherprogram >> myoutput
[10] $ valgrind ./myprogram 2>&1 > ALLthethings
[11] $ valgrind ./anotherprogram 2>&1 >> ALLthethings

Each of the above expressions build on each other. If you can tell what the effect of each expression above is, then you're set. If not, try them out and see what happens.

Formatting

printf and scanf both use format strings to specify what how to format their output. They also both accept variable arguments. All arguments to scanf must be pointers whereas arguments to printf should be values (in the case of numbers) or char * in the case of strings. Pages 153-154 in the K&R explain how to format your format strings for printf() and 157-158 explain formatting for scanf(). Make sure you can identify the following two format strings:

printf("%-15.10s", "hello, world");
sscanf("25 Dec 1988", "%d %s %d", &day, month, &year);

Functions with Variable Arguments

printf() and scanf() family of functions accept a variable number of arguments. You can do this too! Once you've enumerated all the required arguments, you can specify that you would like to also accept variable arguments with ...:

int myFWithVarArgs(int a, int b, ...);

The declaration means that the number and types of all arguments after the integer b can vary. If you want to be able to actually access these arguments you'll need to #include <stdarg.h> whose implementation is system dependent, but interface is the same. To access the values, you will have to do the following:

  1. Declare a variable of type va_list that will point to each argument.
va_list my_arg;
  1. Call va_start() with the last named argument, and your variable that will point to each argument.
va_start(my_arg, b);
  1. Call va_arg() with the variable that will point to the argument as well as the type of the argument (you'll need some way to figure out the type of this argument from your other arguments). Assign the return value to a variable. Repeat this step for each argument you want to read.
int myvarInt = va_arg(my_arg, int);
  1. Clean up by calling va_end() with the variable list of args variable.
va_end(my_arg);

File I/O

Think of file I/O as writing/reading from stdin or stdout and your life will be much simpler. It all starts with File descriptors which you use to reference a file. In order to work with files, you'll need to #include <stdio.h>.

FILE *fp; //Declare a file pointer
fp = fopen("README.txt", "r");

File descriptors are just fancy pointers for files. By default, all C programs are given three to start with: stdin, stdout, and stderr so note that any functions you can use file descriptors with you can use on the I/O streams we've already mentioned. A file descriptor is a pointer to a special struct that stores important information about where you currently are in a file, whether you can read/write to it, and what the file is. You don't need to know how it works, just accept that it does and you'll need to pass the File descriptor to functions that work with files.

fopen() is how you'll open files. It takes two arguments, both strings. The first is a string representing the path to the file you want to open, and the second is the mode with which you will open it. The mode tells whether or not you are going to be reading, writing, or appending to the file and also how you want to read the file in. Make sure you know the difference between "r", "w", "a", "r+", "w+", "a+", and all of the above with a "b" on the end. If fopen() fails it will return a NULL pointer. This can happen because a file doesn't exist (in the case of r's and a's) or because you don't have permissions to access the file.

fclose() will close the file when you're done. In general you will use fgets() and fputs() to read from and write to files. You can also use the variants of printf and scanf, fprintf() and fscanf() to write to and read from files. getc() and putc() are the lower level versions of these functions, and have macros for getchar() and putchar() which interact with stdin and stdout respectively. The functions fread() and fwrite() work in blocks instead of characters. These can be far more efficient than fgets() and fputs()

What's Buffering?

Buffering determines how often the contents of a stream are sent to their destination. There's some low level stuff going on at this point, but just understand that its not very efficient to send data one character at a time, so buffering happens. Unbuffered streams are constantly flushed to its destination. Line-buffered streams are only flushed to its destination after a newline character is written. Block-buffered streams are flushed when they reach a certain size. You can use fflush(fp) to manually flush the buffer for any file pointer.

  • stderr is unbuffered (why?)
  • stdout is line-buffered when it's connected to terminal
  • everything else is block-buffered

RECITATION 6

Back to Contents

*nix Systems

The Stack

  1. Hardware
  2. OS Kernel
  3. System Calls
  4. Library Functions
  5. Applications

Software is built in many layers that depend on each preceding layer. Understanding what level each piece of functionality exists at is imperitave to strong programming.

Users and Permissions

Unix systems have three different ways to allocate permissions: owner (a user), group, and everyone else.

  • Owner The owner is always a user and the owner can own both files and directories.
  • Group A file or directory is assigned to a group, usually but not always the user group that the owner belongs to.
  • Others Those who are not in the assigned group, nor an owner, belong to the others.

The owner can always assign owner permissions to another user.

There are three UNIX file permissions. Read, write, and execute. Always think of them in this order. Read is the ability to access or look at the contents of the file. Write is the ability to modify the contents of the file. Execute is the ability to execute a file.

When set for directories, read allows for listing the files in the directory; write allows for creating, deleting, and renaming files; and execute allows for accessing files and metainfo if their name is known, but not listing the files in the directory.

Permissions are conventially represented as combinations of 3 binary digits. Each digit represents one of read write and execute. So for example, 100 would represent the ability to only read and is a value of 4. 110 is the ability to write and read and is a value of 6. 111 would grant read, write, and execute and is a value of 7. Conventially, you could combine these in the order owner-group-other in a string like "644" which would mean the owner can write/read, and the group and others and only read. chmod ### path can be used to set permissions for a path. Note that you can view permissions (among other things) of files in your current directory by entering ls -al. The -a option includes files that begin with a dot and the -l option displays each file in long format, which is what shows you the file permissions and all that good stuff. See the ls man page for more information.

Some Notes About Lab 4

If you plan on doing the pair option and would like to procure a private GitHub repo, please request one ASAP, because it takes a few days to process. Additionally, if you're new to GitHub/BitBucket, you may find the Git Documentation, the GitHub Help page, and the BitBucket Documentation handy.

Whether you choose to do the pair option or the solo option, you may wish to review each of the I/O functions we've learned in lecture note 9. Be sure that you understand the differences between them and note that there are certain situations when some are more appropriate than others, and vice versa (ex., fgets() vs. fread()).

Processes

one-to-one programs-processes is not a thing! A program is a packaged set of instructions, where as a process is an instance of a program. A program can have many processes associated with it.

fork() will split your process into two new processes, each one executing forward from the point of the fork. fork() will return a pid of 0 if it is executing within the child process, or the pid of the child process if it is executing within the parent process. This means you can identify which process you are in simply by checking the return of fork().

exec(char *command) will turn your process into an instance of command. This means it will cease executing your code and execute as the program you have decided to execute. If the program reaches its end, the process will exit and will never return to your code. It will only execute code following an exec call if exec fails.

Fork and executing is how the entire operating system works. The kernel starts an init process and everything is fork/exec'ed from there.

Signals

Signals are an OS's way of communicating with a process outside of the I/O streams. They can be sent at any time and a process has three options upon receiving them:

  1. Let the default action take place
  2. Explicitly ignore the signal (not necessarily possible)
  3. Catch the signal and do something special (not necessarily possible)

There are tons of signals and each one has a number associated with it. Ctrl-C for example is a signal, SIGINT for signal interrupt. You can use the signal() function to set your program up to handle these signals. signal() takes two arguments: the signal you want to handle, and a function pointer to a function of what to do if that signal is received. Not all signals can always be caught with this method, so it is convention to wrap the call to signal as follows:

if(signal(SIGINT, &myHandler) == SIG_ERR) {
  perror("call to signal() failed");
  exit(1);
}

In this way, if the call to signal is not accepted by the operating system you will be aware and can respond appropriately. In this case we simply exit the program with an error code but this might not be the best behavior. Note that to use these constants and functions you must #include <signal.h>. Also be aware that you can only catch the first reception of the signal. The call to signal will set the action to be taken to "myHandler" but afterwards it will be reset to the default action for that signal. Page 255 in the K&R has a little bit more information on this.

RECITATION 7

Back to Contents

size_t

size_t shows up all over the place in memory operations. malloc expects its parameters to be of this form, and certain file operation functions will return the number of bytes read in this formed. What you need to know about size_t is that its an unsigned integer type. This means it works like an integer but cannot represent negative values. So, while the following is okay:

size_t x = 5;
size_t y = 3;
int z = x + y;

The following is not (and wouldn't be a good idea even if the value was positive):

size_t x = 5;
size_t y = 3;
int z = y - x;

Other than that you can treat size_t as any other integer type.

File operations

All of the following are defined in stdio.h and therefore to use them you must #include <stdio.h> to use them.

FILE *

FILE is a typedef'ed structure in stdio.h. Whenever you use it, you'll use a FILE * though because you'll always be getting a value back from/passing it to common file operating functions. Why a pointer? Because these functions will modify the internal values of the FILE value. So while you could copy them because its a struct and C would be fine with passing it by value, things like your place in the file would not be maintained.

fopen and fclose

FILE *fopen(char *name, char *mode);
int fclose(FILE *fp);

The above structure should look a tiny bit familiar. Knowing what we know about FILE *s you might see the likeness here to:

void *malloc(size_t size);
int free(void *p);

That's because fopen and fclose create a new FILE structure on the heap so that the status of the open file can be maintained. This means that if you don't fclose your FILE *s, you'll have memory leaks. Be careful about this.

fgets and fputs

THESE FUNCTIONS ARE FOR LINE INPUT AND LINE OUTPUT

What's that you ask? It means they're really good at reading in lines, but bad for everything else. Only use these functions if lines are a logical way to delimit chunks of the file you're reading.

char *fgets(char *line, int maxline, FILE *fp);
int fputs(char *line, FILE *fp);

fgets reads the next line from the input file in fp into the memory location pointed to by line. If all is successful, it reads at most maxline-1 characters out of the file, and returns line as well. If something goes wrong (on end of file, or error) it returns NULL. It will keep the newline character it reads if it gets to one before it reaches maxline-1 characters. It also ALWAYS appends the null character to the end of the string.

fputs returns EOF if there's an error and 0 otherwise. This will not append a newline to the file, nor does your string need to contain a newline character.

Watch out! gets and puts work very similarly for stdin and stdout but gets will not give you the newline character.

fread and fwrite

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

Each of these functions accept pointers of type void. So if you want each item to be read into the right size memory space, you'll need to tell it the size of each item, and the number of items nmemb to read/write from/to the stream. fwrite promises not to modify the data that ptr references as well. The return the number of bytes read/written. If its anything less than what you expected, you should check what happened using ferror or feof.

fseek

int fseek(FILE *stream, long offset, int origin);

This handy dandy function lets you hop through a file without doing anything other than changing the position in the FILE structure. You can use any stream of your choosing, but pay attention to whether or not its a binary stream. If its a binary stream, offset can be any number of bytes/characters from origin which should be set to either SEEK_SET (a constant representing the beginning of the file), SEEK_CUR (a constant representing the current position), or SEEK_END (the end of the file).

If you're reading a text stream, offset must either be zero or the current position as returned by a call to ftell (see below). In this case, always set origin to SEEK_SET.

FILE *text = fopen("myfile", "r");
FILE *binary = fopen("myfile", "rb");

fseek(text, ftell(text), SEEK_SET);
fseek(binary, -100, SEEK_END);

feof and ferror

int feof(FILE *stream);
int ferror(FILE *stream);

So you didn't get what you were expecting from one of the above functions. What do you do? You call feof or ferror. These two functions let you know what happened. feof returns true if the end of the stream has been reached, and ferror returns true if there was an error reading the stream.

TCP/IP

TCP/IP is the protocol that makes the world go 'round. The internet is powered by IP, and almost all the traffic is TCP (a small amount is UDP, mostly games and video/voice calls where every millisecond matters, and a small amount of data loss is acceptable).

You may have heard of the 7 layer OSI stack, well similarly TCP/IP has a 4 layer stack which means at higher layers you generally don't need to think too many layers lower. At the bottom are simply bits on physical wires aggregated into frames, in the physical and data layer. Above that IP deals with packets, and routing packets to the correct IP address. Above that TCP and UDP use IP packets to manage sending meaningful amounts of data, for example connecting to a specific port on a server. Finally on top of TCP you use specific protocols like HTTP, FTP, IMAP, DNS, and SSH.

IP sends packets towards IP addresses. However you don't want to use IP addresses, such as 160.39.63.50, but instead you want to use a hostname, like paris.clic.cs.columbia.edu. DNS, Domain Name Service, is a protocol and server setup that translates hostnames into IP addresses. Every time your laptop joins a network, it gets its own IP address, the IP address of 1-3 DNS servers, and the IP address of a router that can forward packets on to the rest of the world.

Want to find out your IP address and DNS info? On OSX and Linux run ifconfig in a terminal. You should see something like the below:

en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    ether 58:b0:35:7d:33:7e
    inet 192.168.1.183 netmask 0xffffff00 broadcast 192.168.1.255
    media: autoselect
    status: active

In this case 192.168.1.183 is my IP address.

RECITATION 8

Back to Contents

Signed and Unsigned Integers

From Recitation 3:

Bits Bytes and Binary

Let's just refresh our memory about memory:

  • A bit is a single digit in binary; on or off; 1 or 0
  • 8 bits form a single byte: 11111111 = 2^8 - 1 = 255
  • Hexadecimal is another notation to count even higher in fewer places
    • Two hexadecimal places express 1 byte
    • FF in Hexadecimal is 255
  • Two's complement
    • Most modern computers use this notation for signed integers
    • Most significant bit: Usually the leftmost, but generally the bit with the highest value: If 1000 = 8, then 1 is the most significant bit. If we were using a different notation such as 1010 = 5, then the rightmost 0 is the most significant bit.
    • If the most significant bit is 1, then in two's complement, you're looking at a negative number
    • To convert: 1010 (read as ten if unsigned), first note that it is negative. Then find its magnitude by flipping all the bits (0101, 5) and then adding 1 (0110) meaning the value is -6.
    • Consult the following table to see something interesting: (note the wraparound effect
Binary usigned decimal two's complement decimal
000 0 0
001 1 1
010 2 2
011 3 3
100 4 -4
101 5 -3
110 6 -2
111 7 -1

Data Types: Numbers

Integer Types:

  • char
  • int

Modifiers (and sugar):

  • short
  • long

Memory size for each type depends on system, and only restrictions are that

char <= short <= int <= long <= long long

Clic machines follow

  • char = 1 byte
  • short = 2 bytes
  • int = 4 bytes
  • long = 8 (these last two vary from system to system a lot)
  • long long = 8

Test it out for yourself:

#include <stdio.h>

 int main(int argc, char** argv) {
   printf("char: %d\n
     short: %d\n
     int: %d\n
     long: %d\n
     longlong: %d\n",
     sizeof(char), sizeof(short), sizeof(int), sizeof(long), sizeof(long long));
   return 0;
}

Here are some declarations to help you understand what really happens when we're talking characters and integers. Definitely take a look at The Ascii Table and understand the relationships in the order of the first 128 ascii characters. The C language is built on a subset of 7-bit ascii (0-127) so its important to know what the table is like, not to memorize it. Also note that in C, single quotes means a character.

Declaration x (dec) y (dec)
int x; NULL -
int x, y; NULL NULL
int x = 0, y; 0 NULL
char x = 'x'; 120 -
char x = '\n'; 10 -
char x = '\13'; 11 -
char x = '0'; 48 -
char x = '\0'; 0 -
char x = 0; 0 -
long x = 0L; 0 -

Preceding a constant with 0x denotes hexadecimal notation:

(0xFFFFFFFF == -1); //returns 1 (which is true, but C doesn't have true)
(037777777777 == -1); //returns 1 (true)
sizeof(1234L); //returns 8
sizeof(1234); //returns 4
0xFFFFU; //returns 65535
0177777U; //returns 65535

Float and double are the two floating point types (decimal) and can be expressed with a decimal point or as scientific notation:

float miles = 1.8;
double big = 1e10;

The only implementation constraint is that

float <= double <= long double

so they could all be one size, or be three distinct sizes.

In C there is no such thing as a string, just an array and pointers. Essentially, a bunch of single characters located consecutively in memory will make up a string, but more on this later.

Bitwise Operators

Also from Recitation 3:

Bit-wise operators are tricky and can be used for a variety of purposes:

  • & can be used to "mask" or turn off all bits except certain ones. For example:

    n = n & 0177; // n and 00000001111111
    
  • | can be used to set on all bits:

    n = n | 0177; // n or 00001111111
    
  • Note:

    int x = 1;
    int y = 2;
    x & y; // 0
    x && y; // 1
    
  • ^ Sets 1 in each bit where its operators differ and 0 where they are the same.

  • << and >> shift their left operator by the number of digits specified by the right operator. Left shifting always fills vacated bits by zero. Right shifting varies from machine to machine and whether or not we're talking unsigned or signed.

    int x = 2;
    x = x << 2; // x == 8
    x = x >> 1; // x == 4
    
  • ~ just does the one's complement, ie. flipping all the bits.

Declarations

It seems in order to prepare for the midterm, it might be a good idea to go over some existing programs and declarations.

  • For integer expressions, write the actual number value in decimal notation. (Remember char is integer!)
  • For non-integer expressions, write the type name, in the format that you use to declare a variable of that type.
  • Write "invalid" if a given expression is not a valid C expression

Note:

  • Boolean expressions are integer expressions in C. Do not write true or false.
  • You can assume that the program is run with no command line argument

On the front of the exam, you'll find useful information (sometimes) like:

  • Language: C
  • Compiler: gcc
  • Platform: Ubuntu Linux 12.04, 64-bit version
  • Primitive type sizes: sizeof(int) is 4 and sizeof(int *) is 8

Now you know tons about the other types!

struct Node {
  void *data;
  struct Node *next;
};

int main(int argc, char **argv)
{
  int a = 3;
  int b[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  struct Node arr[5] = {&a, arr+1, b, arr+2, "hello", arr+3};

Some tricky (maybe not) questions now:

  • b[2]
  • sizeof(b)
  • sizeof(b[2])
  • sizeof(&b[0])
  • "hello"
  • arr[1].data[4]
  • arr[0].data[4]
  • sizeof(arr[3].next)

Git and Makefiles

Some key git commands to understand the basic usage of:

  • git clone
  • git status
  • git add
  • git commit

Make sure you know what levels of git tracking exist:

  • untracked
  • tracked, unmodified
  • tracked, modified, but unstaged
  • tracked, modified, staged
  • tracked, committed (??? What's wrong here?)

Also know how dependencies work in Make. Jae's sample makefile is really great to use to understand this. I've included the relevant sections from recitation 2 below:

Take Jae's Makefile piece by piece. It can be found in this git repository as sample-makefile

CC  = gcc
CXX = g++

Make has a some pre-configured rules for how to compile programs. For example it knows how to specify files as arguments to a compiler. However, you should tell it what compiler to use for C files and C++ files. Here, we set the special make variables CC and CXX to gcc, the C-compiler, and g++, the c++ compiler.

INCLUDES =

CFLAGS   = -g -Wall $(INCLUDES)
CXXFLAGS = -g -Wall $(INCLUDES)

Here we define our own variable, INCLUDES, which we can use for directories that we wish to include at the compilation step. An example value for INCLUDES could be -I/home/jae/cs3157 which would tell the compiler to look in /home/jae/cs3157 during the compilation step for missing header files and other sorts of relevant files.

After defining INCLUDES, we define the flags that we want each compiler to be run with. In this case we include the -g flag for debugging and -Wall flag to display all warnings. Lastly, we reference our variable INCLUDES to add those flags as well.

LDFLAGS = -g

LDFLAGS are the flags that are appended to the compiler when using it for linking. In this case we just want the debugging info to be included.

LDLIBS =

LDLIBS will automatically be appended to the linker commands. These are flags like -lm and function similarly to our INCLUDES variable but are added at a different step. m denotes the math library.

That's about it for our variable declarations. The next step is to define compile order and dependencies. The very first "target" or rule in your makefile gets built when you type make in this case the first target is:

main: main.o myadd.o

Note that we did not specify the linking rule, because make follows an implied linking rule:

$(CC) $(LDFLAGS) <all-dependent-.o-files> $(LDLIBS)

Also note that make assumes that main depends on main.o, so we could omit it:

main: myadd.o

Basically what this rule says is make should produce an executable called "main" by linking myadd.o and main.o. This declares main.o and myadd.o as dependencies of main, meaning that if any of the dependencies (or their dependencies) change between the last time this target was run, it should re-run the outdated targets as well as this one.

The next target we declare is main.o:

main.o: main.c myadd.h

This says that main.o depends on main.c (assumed) as well as myadd.h. See last week's recitation notes to understand why main.o depends on myadd.h. We could omit main.c as follows:

main.o: myadd.h

Either way, we do not specify a rule here because make assumes the implicit rule:

$(CC) -c $(CFLAGS) <the-.c-file>

Lastly, we specify the target for myadd.o:

myadd.o: myadd.c myadd.h

We'll include two phony targets. We tell make that they're "phony" so that it doesn't attempt to use implicit rules or try to compile them. The first target we make is "clean" which should remove all intermediate files. Always include a clean so that make clean can be used to remove intermediate files like object files, compiled code, etc. This should return your directory to just its source code that can generate all the other files. Be careful: Using rm -r will not prompt you to remove files. This is customary for make clean but it also means if you make a mistake in designing your rule it could remove files that you didn't want to. There is no "trash" in UNIX - they'll be gone forever.

Lastly, we define a phony "all" target that just depends on the main and clean targets. This will always remove all intermediate files and compiled files, forcing make to recompile everything when main is called.

.PHONY: clean
clean:
        rm -f *.o a.out core main

.PHONY: all
all: clean main

Function Pointers

From Recitation 5:

It is often the case in programming that you'll want your program to change its behavior for the person using your program but have no idea how they may want to do this. For example, when writing a sort function, you could just specify a parameter that let's them choose between sorting ascending and descending. However, what if they wanted to sort characters by their unicode value instead of lexicographically? For this, the user has to supply their own functionality. That's where you need function pointers. Some higher level (read lambda-programming languages) support this is in a more intuitive way, but C does a pretty good job itself.

Accepting functions as a parameter to your function

Function pointers allow you to accept as a parameter to your function, another function. Let's start out with accepting functions as a parameter:

void notifier(int (*fn)()) {
  printf("Starting\n");
  fn();
  printf("Finished\n");
}

Let's examine this more closely. We have a function called notifier whose only parameter is another function. How do we do this? we specify what kind of function it accepts. The basic layout for a function pointer type is returntype (*functionName)(parameter1type, parameter2type, ...). Why? Well the returntype denotes the type of the function. The name of the function being preceded with an asterisk tells us its a pointer. The function is surrounded by parentheses so that the compiler doesn't think we've got a variable returntype *functionName. The parenthese following the declaration are necessary as well, even if the function we want to accept doesn't have any arguments. The parameter's only need their types declared so that the compiler can check these. Since you won't have access to the code you don't need to worry about what to call them.

Passing a function as a parameter

int wasteTime() {
  int i = 0;
  while(i < 1000000)
    i++;
  return i;
}

int main(int argc, char **argv) {
  wasteTime();
  notifier(wasteTime);
  return 0;
}

Notice how we have a function wasteTime. And when we follow it with perentheses it gets called. When we don't, it is automatically a function pointer. Why is this?

Well everything in the program is stored in memory, even the functions, which means even they have an address. So wasteTime has an address in memory. When you follow wasteTime with perentheses (wasteTime()), C goes to the address and executes the function. If you think of wasteTime as storing a pointer to a function, then you can think of wasteTime() as dereferencing the pointer. Therefore, when we call notifier and pass it wasteTime without parentheses, it passes the address to wasteTime to the notifier function.

Calling functions from pointers

We could further use the functions above as follows:

int main(int argc, char **argv) {
  int (*f1)();
  char *(*f2)(char *, const char *);
  char arr[5] = "bann"

  f1 = wasteTime;
  f2 = strcpy;

  notifier(f1);
  f2(arr, "hihi");
}

As you can see from the above code, you can declare variables of "function pointer" type and assign function pointers to their values. Then you can call those functions simply by adding parentheses to the end of the variable name. Make sure to check out Jae's notes (lecture 7) for more complicated examples of this. Notice that we did the same thing when calling fn() in our notifier function.

Order of Operations

Binary Operators, Increasing Order of Precedence:

  • < > <= =>
    • / %
  • &
  • ^
  • |
  • &&
  • ||
  • ternary
  • += -= *= /= <<= >>= 7= ^= |= %= >>= =
  • ,

Unary Operators, Increasing Order of Precedence:

  • (expression), [], ->, .
  • !, ~, ++, --, (type), sizeof, +, -, *, &, (right to left)

On Memory Errors

Source: http://www.cprogramming.com/tutorial/memory_debugging_parallel_inspector.html

Invalid Memory Access

char *pStr = (char*) malloc(25);
free(pStr);
strcpy(pStr, "parallel programming"); // Invalid write to deallocated memory in heap

Missing Allocation

char* pStr = (char*) malloc(20);
free(pStr);
free(pStr); // results in an invalid deallocation

Uninitialized Memory Access

char *pStr = (char*) malloc(512);
char c = pStr[0]; // the contents of pStr were not initialized
int a;
int b = a * 4; // uninitialized read of variable a

For more information, reference Recitation 4.

Good to Know Additional Topics

Just some things it might pay to know well:

  • What mdbrec's look like and how you interacted with them
  • How the functions you implemented in lab3 should work, what they return, etc. etc.
  • What the pipeline for lab5 looked like and how it worked
  • The ASCII Table (See Recitation 3)
  • malloc and free...this is tough
  • PLEASE PLEASE PLEASE PLEASE FOLLOW DIRECTIONS The exam will be graded in the same way your labs are graded if not less lenient. Read all the directions. If it says "write the line number and the code that should be on that line," do just that!

RECITATION 9

Back to Contents

Introducing C++

Strings in C

At this point, you're all quite familiar with C's quirks and kinks. Perhaps one of the most notable ones is the absence of a String datatype!

Recall that strings in C are just special cases of arrays. C strings are arrays of characters with a null terminating character at the end.

char buf[50]; //stack allocation
strcpy(buf, "hello");

char *s = (char *) malloc(50); //heap allocation
strcpy(s, "hello");

//don't forget to free s when you're done!

But finally, after months of dealing with those pesky invalid reads, memory leaks, and so on, we can transition into a new language that permits a much friendlier way to handle strings: C++!

Classes/Structs in C++

Before we go into strings in C++, let's talk about the basics of classes and structs first. Although we can define a struct in C++ the same way we do in C, structs tend to get much more complicated in C++ than their C predecessors. But let's start with the C++ version of the struct Pt example from Jae's lecture notes:

struct Pt
{
	double x;
	double y;
}

Stack Allocation in C++

Here's how you allocate space for a struct Pt in C++:

struct Pt myPt;

//in C++, you can omit the 'struct' part:
Pt myPt2;

It looks exactly like how we allocated it in C! Boooring. But is it really doing the same thing?

Classes/structs in C++ have constructors whose purpose is to initialize all data members and base classes. When we declared our struct Pt, myPt, the default Pt constructor was called under the covers. If we don't define any constructors for a class/struct, the C++ compiler implicitly defines a default constructor for us. The synthesized default constructor will call the default constructors of all members of class type and leave members of built-in type undefined.

What if we attempted to call the default constructor like this?

Pt myPt3();

Oftentimes, the synthesized default constructor is not what we want, so we define our own constructors. Let's say we want the user to be able to specify values for Pt members x and y. We could do so as follows:

struct Pt
{
	double x;
	double y;

	Pt(double myX, double myY);
};

//we could have also made this inline
Pt::Pt(double myX, double myY)
{
	x = myX;
	y = myY;
}

And we can call it like this:

Pt myPt2(4, 4);

When we define member functions for our classes, we can choose to define them within the class definition (the part enclosed by the curly braces of struct Pt), or just declare them within the class definition and define them elsewhere. Member functions defined within a class definition are implicitly inline. Inline functions are kind of like macros in that calls to them are replaced with the body of the function. Inline functions should be short. If we don't define functions inside the class definition, we need to use the scope resolution operator, ::, to indicate that it's a member of the class.

A word of caution, though: if we define any constructors ourselves, we can no longer rely on the synthesized default constructor. Now that we've defined a constructor for Pt that takes two doubles, we lose our synthesized default constructor. A declaration like this would be illegal now:

struct Pt myPt; //error!

So if we want our class to have a default constructor, we'd better make sure to define it along with our other constructors, lest the compiler complain.

A convenient way to account for two constructors in one definition is to use default arguments. If no arguments are supplied to the constructor (as is what occurs when we call the default constructor), default values are assigned to the arguments, as if we had passed these values ourselves:

/*this constructor deals with calls to the default
constructor and calls to the constructor taking
two doubles*/
Pt::Pt(double myX = 0, double myY = 0)
{
	//if no arguments are passed, myX == 0
	//and myY == 0
	x = myX;
	y = myY;

	//note that we'd have to update our function
	//prototype to Pt(double myX = 0, double myY = 0);
}

Hand-in-hand with constructors are destructors. The purpose of the destructor is to deallocate all data members. If our constructor allocated space on the heap for its data members, our destructor should free up that space. If we don't explicitly define a destructor, the compiler will synthesize one for us, which just calls the destructors of all class type members and does nothing for the built-in type members. A stack-allocated variable's destructor is called when the variable falls out of scope, after which the stack shrinks accordingly.

Heap Allocation in C++

Although you can still use malloc to allocate space on the heap, using malloc doesn't allow you to call the constructor of a class type object. A preferred way to do heap allocation in C++ is via the new operator:

Pt *myPt = new Pt; //myPt is a pointer to sizeof(Pt) allocated bytes
Pt *myPt2 = new Pt(4, 4);

//heap-allocated array of Pt's:
Pt *myPtArray = new Pt[10];

The new operator not only allocates space for myPt on the heap, but it also calls Pt's constructor, in this case, the default constructor. Like with malloc, we must remember to free up the heap space we allocated with new:

delete myPt;
delete myPt2;
delete [] myPtArray; //deleting a heap-allocated array

delete calls the Pt destructor of myPt and frees up the heap space it was using. new goes hand-in-hand with delete, and malloc goes hand-in-hand with free. But don't try to mix the four, i.e., calling new and freeing the memory with free.

classes vs. structs

There is a subtle, yet important difference between classes and structs in C++. In a struct, the members defined prior to the first access specifier (e.g., public, private, etc.) are public. In a class, they are private. We want our members to be private, so we'll be writing classes in our labs.

The "Basic 4" and Jae's MyString Class

There are four elements of a C++ class that you should always consider:

  1. Constructor
  2. Destructor
  3. Copy Constructor
  4. Operator=()

The Copy Constructor

We've already discussed constructors and destructors, but what about the other two? The copy constructor specifies how to construct a class type variable using an argument of that class type, i.e.:

string myString("hello");
string myStringCopy(myString); //invoking the copy constructor explicitly

The copy constructor is called implicitly in a couple other scenarios: passing by value and returning by value.

string call_copy(string myString) //copy constructor is called to create temporary copy of myString local to call_copy
{
	return myString; //copy constructor is called to return myString by value
}

If you don't explicitly define a copy constructor, the compiler implicitly defines one for you, which just sets all data members of the newly constructed object equal to all data members of the argument object. This may or may not be what you want. Consider this segment of the MyString class from Jae's lecture notes:

class MyString
{
	char *data;
	int len;

	MyString(char *p);
	...

}

...

MyString::MyString(char *p)
{
	if (p)
	{
		len = strlen(p);
		data = new char[len+1];
		strcpy(data, p);
	}

	else
	{
		data = new char[1];
		data[0] = '\0';
		len = 0;
	}

}

MyString::~MyString()
{
	delete[] data;
}

If we didn't define a copy constructor for MyString, initializing a new MyString from another MyString would make the new MyString's data pointer point to the same heap-allocated space as the old MyString's data pointer. Upon destruction of either MyString, the other MyString would have a data pointer pointing to a freed piece of memory: hello, memory errors!

As a rule of thumb, if your class necessitates explicit definition of a destructor, as MyString does, chances are that your class necessitates explicit definition of a copy constructor.

So how do we define a copy constructor? We know that the copy constructor is called when we try to pass an object by value, so we can't pass a MyString object as a parameter to our copy constructor; how can we define a copy constructor while relying on the existence of a copy constructor? We need a C++ construct known as a reference. You can think of a reference as a dereferenced pointer to something. For example:

int x = 5;
int& y = x; //y is a reference to x

x = 6; //y is now 6
y = 7; //x is now 7

The reference construct allows us to pass an argument by reference, without the need for all that pointer business we've seen before. For example, this function will increment the integer passed as an argument, since the variable it's working with isn't a temporary copy of the argument--it's an alias for the argument itself!

//note that to use iostreams this way, we need to have:
using namespace std;
#include <iostream>

void increment(int& x)
{
	x++;
}

...
int y = 5;
increment(y);
cout << y << endl; //prints 6

Using this new reference construct, we are ready to define our copy constructor:

MyString::MyString(const MyString& s)
/*note that the const indicates that we will not modify
 the contents of s within this constructor*/
{
    len = s.len;

    data = new char[len+1];
    strcpy(data, s.data);
}

The Assignment Operator

If your class necessitates the definition of a copy constructor, your class necessitates the definition of an assignment operator for the same reasons. We can write our assignment operator almost entirely the same way we wrote our copy constructor, except now we also have to deal with the existing data of the lvalue. We can examine the contents of the lvalue via the C++ this pointer. this is a pointer to the object on which we're currently operating. The C++ this is not to be confused with the Java this, as the Java this is the object itself, rather than a pointer to it.

Now that we have this at our disposal, we can write our assignment operator:

MyString& MyString::operator=(const MyString& rhs)
{
    if (this == &rhs) {
	return *this;
    }

    // first, deallocate memory that 'this' used to hold

    delete[] data;

    // now copy from rhs

    len = rhs.len;

    data = new char[len+1];
    strcpy(data, rhs.data);

    return *this; //returns the MyString on which we are calling the assignment operator
}

Note that our assignment operator should return a MyString& so that we can chain calls to it:

MyString MS("hello");
MyString MS2("world");
MyString MS3("yay");

(MS = MS2) = MS3; //MS is now "yay"

Make sure you understand the importance of the Basic 4 and always consider whether they're necessary for your program. Hint: they're almost always necessary.

Other Nifty C++ Features

Implicit Conversions

Recall from your lovely memories with C that there are some automatic type conversions that occur when intermixing types, for example:

int y = 5;
double z = y; //y is coerced into double type

Automatic type conversions can occur for class type variables in C++. For our MyString class, we can do something like this:

MyString s("hello");
s += "world";
/*compiler uses the MyString constructor that
takes a char* to create a temporary MyString whose
value is "world", which allows us to call our
+= operator on two MyString objects*/

//note that the lifetime of our temporary MyString is the expression in which it was created

If we want automatic type conversion to occur, we need to make sure that the compiler is able to make the connection between the first type and the other, via our constructors (i.e., we can construct a MyString from a char*, but not from an int.)

Operators

C++ allows us to define our own operators for the classes we write. This means that we can do things like:

MyString MS("hi");
MyString MS2("dude");
MS = MS + MS2; //MS reads "hidude"

You can find a list of all overloadable operators in C++ on page 553 of Lippman, 5th ed. For our MyString class, we'll be overloading +, =, <<, >>, and [] (plus the ones you'll overload in lab 9).

You may find yourself running into the question of member versus nonmember implementation of your operators. Symmetric operators, operators that should allow implicit conversion of either operand, should be nonmember functions. Two examples of these are the + and - operators. Operators whose left-hand operand isn't of the class type shouldn't be members of the class, for example, the << and >> operators of our MyString class. Operators that change the state of their object should be members. The assignment (=), subscript ([]), call (()), and member access arrow (->) operators must be members.

Friend Declarations

If we want to have nonmember operators that access our nonpublic data members, we need to declare them as friends, as we do in mystring.h.

	// operator+
	friend MyString operator+(const MyString& s1, const MyString& s2);

	// put-to operator
	friend ostream& operator<<(ostream& os, const MyString& s);

	// get-from operator
	friend istream& operator>>(istream& is, MyString& s);

Remember this joke: "Only you and your friends can touch your private parts."

Const Member Functions

Note that the const version of the subscript operator in MyString has const at the end of its prototype. What's that about? A const member function is a member function that promises not to modify the object on which the function is being called. As such, the const version of the MyString subscript operator promises to not modify the contents of the MyString that it's subscripting. Note that we can cast away the constness of *this so that we can reuse our nonconst subscript operator:

// operator[] const - in real life this should be inline

const char& MyString::operator[](int i) const
{
    // illustration of casting away constness
    return ((MyString&)*this)[i];
}

RECITATION 10

Back to Contents

Recitation 10

C++ templates and containers

At this point hopefully you've gotten used to writing basic classes in C++ from our experience with MyString. At this point we can move in one of two directions. First, we could take a more typical Object Oriented track and discuss inheritance, polymorphism, and those sorts of things that companies love to ask about during tech interviews. However if you've used Java at length you're probably already somewhat comfortable with this, even though it's incredibly complex and confusing (multiple inheritance = head asplode).

Instead, in 3157, we're going to discuss templates and generics. The interaction between generics and C++ is very interesting, and they work differently than in most other languages. Consequently they're incredibly powerful, and often very useful for programming fast and flexible libraries.

Recall how we created executables in C (and C++ so far)

If you go back to the beginning of the course, we discussed compilation as the following steps:

  1. Preprocessing (#include, #define, other # commands)
  2. Compiling (turning raw .c files into .o object files)
    • Recall, we could also create a library .a file from our .o files, if we wanted. This would be an extra step after compiling, before linking
  3. Linking (combining 1+ .o files into a single executable)

In this step we can throw away the source code after 2, and as long as we have the .o and .a files we can link successfully.

Drawbacks to void *

This approach is feasible in C, because it's a small language with relatively few types. The prominence of pointers and casting - you can't write basically any C without using them - means that using void * as a way of handling unknown types is fairly natural. This is what we did in our linked list in lab 3.

However it has some problems: it doesn't provide any type safety, so you have to constantly cast your void pointers to whatever type they really should be. This means that the compiler is essentially helpless - it can't provide the static checking that catches errors at compile time. Additionally, the list can't provide any guarantees about the lifetime of things to which it points. As you may have seen, it's very easy to end up with invalid pointers.

Templates

Templates are (one way) C++ "solves" the problems with void * pointers. Templates let you write code that accepts generic data types and provide compile time type checks. In 3157 we'll primarily focus on class templates (like vector), not function templates.

Before we proceed into talking about templates, a word of warning. Templates break, or at least abuse, the standard mental model of compiling and linking that we've established so far. They do things that "feel" like linking, but it's actually happening during compiling. So just be aware that you'll need a new mental bucket for thinking about templates. (We'll discuss that more soon.)

Templates are kind of like madlibs. They contain almost the whole story, but leave certain things blank. Whatever you put in the blank will complete the story. Using templates is generally quite easy. Say we want a vector (discussed more below): we just define a vector and specify the type of object it holds. Let's say we want to store a bunch of ints and then a bunch of MdbRecs. That's easy:

#include <vector>
//...

vector<int> ivec;
vector<MdbRec> mdbvec;

But there's no need to stop there! What if we want a vector of vectors of MyStrings?

vector<vector<MyString> > msvecvec;

So, what's the type of ivec? vector? Nope! It's type vector<int>. The type of the template'd thing includes the type of the things being used in the template. This is because templates are handled in the compilation step, and basically generate a new type. This isn't exactly what happens, but mentally you can think of the compiler as seeing your vector<int> and creating a normal, non-template type vector_int; then it goes through and replaces every instance of the vector<int> template type with the vector_int normal type. If elsewhere you use vector<MyString>, it creates an entirely separate class called vector_mystring.

It helps to think about how the implementation is written.

template <typename T>
class vector {
  public:
    //......
    void push_back(const T& x);
        //......

  private:
    T *a;
    size_t size;
    size_t capacity;
    void grow();
};

Note that push_back takes a constant reference to type T (const T&). If you've done lab 9, you'll know that it's generally better to take constant references for a couple of reasons:

  1. By taking const, you increase the range of things you can take (both const and non-const)
  2. By taking a reference, you avoid the overhead associated with copy constructors, temporaries, etc.

Templates: under the hood

So we recalled above how compiling worked in C. We create a struct, like struct List list, and call a function like initList(&list). The compiler looks at our source code and our included headers to see if the type struct List and the function initList(struct list *) are defined anywhere. It doesn't care about the implementation at this point, only whether they're defined as valid, legal types. If so, it creates an object file. Later the linker goes through and links the separately compiled objects together into a single executable.

This all breaks down with templates. As we discussed previously, there is no type vector. It's code that can work with any type. So whenever you use a vector, the compiler needs to see the entire definition because it has two jobs to do: 1) check that you're using it validly, and 2) actually generate the code. When you use the template with a type, the compiler is actually creating entirely new types, functions, etc. To do that it needs to have the full code available. This is why we put templates entirely in the .hfiles.

A comment from Jae about this:

Template compilation is very tricky. The crux of the problem is that when a piece of code uses a template, the compiler needs to see the whole definition of the template because it needs to generate a typed instance. This is different from C, where the compiler does not need to see the body of a function when it's compiling a piece of code that calls it. Basically, the C model of separating interface (.h) and implementation (.c) doesn't really work in C++. This is one of the things in C++ that is fundamentally incompatible with C, and no matter what you do you'll end up with a kludge.

We take the easiest way (as most people do): each compilation unit (i.e. each cpp file that uses a template function or a template class) will include the generated typed instances of a template. That is, if foo.cpp and bar.cpp both use Stack, foo.o and bar.o will each have the generated Stack machine code in there. The linker will throw away the duplicates at link time. Wasteful, but simple. This is known as the "Borland" model. (Borland is a compiler vendor that most of you are too young to know. I actually remember installing Borland C++ compiler on my PC from 3.5-inch floppy disks, 22 of them.)

This gcc manual page explains the Borland model and other more fancier alternatives: http://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html

See the end of Lippman 5th ed 16.1.1, "Template Compilation" for a more thorough explanation.

Anecdote: Template metaprogramming to an extreme

One really nice feature about templates is that, although they take a long time to compile, they're incredibly fast running. There's literally no overhead during execution for a templated class, because they're just turned into the code that you would have written yourself anyways.

There's a really cool Bayesian statistics project at Columbia run by Andrew Gelman's group called Stan http://mc-stan.org/. It's very fast to run complicated statistical models (under a second for moderate sizes, where competitors would be several minutes), in part because the whole thing very extensively uses C++ templates. However it comes at the cost of 30-60 second compilations every time you want to create a new model (even just a tiny tweak). This is also because of the C++ templates, which mean it has to recompile a huge amount of code every time. By using templates the code is a little trickier to write, longer to compile, but the end result is execution times that are as fast as hand written code, with significantly more flexibility for the end user. This lets them have end users write whatever models they want, at basically no cost in slowness.

Alternate Template Motivation

If you'd like another reason for templates, the canonical example is writing these two functions:

// returns 0 if the values are equal, -1 if v1 is smaller, 1 if v2 is smaller
int compare(const string &v1, const string &v2)
{
    if (v1 < v2)
        return -1;
    if (v2 < v1)
        return 1;
    return 0;
} 
int compare(const double &v1, const double &v2)
{
    if (v2 < v1)
        return 1;
    if (v1 < v2)
        return -1;
    return 0;
}

You might realize that functions are nearly identical, but still have two implementations: one for strings and one for doubles. And all of this doesn't help you at all if you want to write a compare for ints! If only there was a way to do the same logic, and use the already existing operator overloading to just call whatever the appropriate operator< function...luckily ,templates do this!

The template'd version of this is:

template <typename T>
int compare(const T &v1, const T &v2)
{
    if (v2 < v1)
        return 1;
    if (v1 < v2)
        return -1;
    return 0;
}

What's with that weird double template stuff for ostream?

http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Ffriends_and_templates.htm

Sometimes you need to define FUNCTION templates in association with a CLASS template. This is a friend function that depends on the specific type of the class. If we look at Jae's stack example from lecture note 22, he declares

template <typename T>
ostream& operator<<(ostream& os, const Stack<T>& rhs)

There's one function for each type used, ie there's a operator<<_for_stack_of_int, and operator<<_for_stack_of_strings. Those are different functions, and each needs to be created as separate function templates. However they're friends, they aren't member functions, hence they need to be declared as separate templates.

Containers

The C standard library, STL, provides containers. Containers are a standard set of library classes for containing a bunch of objects of some type. They're implemented with templates, and are pretty much the canonical example of templates in the language.

Value Semantics

One key feature of containers is that they provide value semantics. That is, containers at least appear to make copies of the things you store into them, and manage the lifetime of those copies. They guarantee that even if you destruct the original, the copy it stored is safe. And once you destruct the vector, the copies it made are destructed as well.

Understanding value semantics is key for lab 10, and helps explain why they're so handy. Something about STL containers could easily be a final question.

Note that Jae's lecture note 22 does a good job of explaining containers and iterators.

What can you do with containers?

Iterators, of course, but also: things like sorting, inserting, swapping, splicing, merging, etc, etc. You can pass them into functions that will sort, or print, or whatever you could want.

Vector

Vector is the prototypical container in this class. It's a sequential container, supporting O(1) random access to elements, and O(1) amortized appends (push_back()). Under the hood it's an array with some blank spaces on the right; whenever it runs out of blanks it creates a new (larger) array, and copies the data from the old to new. It manages that underlying array internally.

List

List is a linked list, like we did in lab 3 but fancier. In particular the list class is doubly linked, so every node points not just to next, but also to previous (i.e., the node that points to it). It also maintains a tail pointer in addition to head, so it can append in O(1) constant time.

This means that list does NOT define operator[] or anything similar - it has no random access, and it must do some pretty fancy stuff in its iterator.

Deque

Also known as deque, and pronounced "deck", it's a doubly ended queue: like a vector with space on both the left and the right. (Deque is actually implemented differently, but it's a good mental model.)

Derived Containers

Based on vector, list, and deque, there are several derived containers that use one of those as the underlying container. They include queue (uses either list or deque), priority queue (using vector or deque), and stack (using any).

Set

Sets provide very efficient lookup to see if an element is in the set, along with mathematical set operations like union, intersection, etc. They keep the elements sorted using the operator< function. (Unimportant detail: internally they use some type of self balancing binary search tree.)

There's also an unordered set (unimportant detail: that uses a hash table).

Pair and Map

We discuss pair and map because it's useful to understand the flexibility and total genericness of the STL containers. Absolute understanding of how they work isn't important for 3157.

A pair is a class that lets you group two objects together, potentially of different types. A simple implementation is just:

    template <class T1, class T2>
    struct pair {
        T1 first;
        T2 second;
    };

    pair<string,int>  prez("obama", 1961);
    pair<string,int>  veep("biden", 1942);

Pairs are useful because they're often a natural extension of generic containers. We want to store an int plus some value instead of just an int. Pairs provide a convenient way to do that.

One use is in map, which maps keys to values. It will return pairs when it wants to return (key, value) tuples.

  map<string,int>  word_count;
  string word;
  while (cin >> word)
      word_count[word]++;

  map<string,int>::iterator it;
  for (it = word_count.begin(); it != word_count.end(); ++it) {
      cout << it->first << " occurs ";          //recall that it->first is (*it).first
      cout << it->second << " times." << endl;
  }

Iterators

Iterators are the key feature that makes containers so useful as a group. Every container specifies an iterator type, providing a consistent way to access every element stored in the container. Compare the same iterator code, one written for a vector, and the other for a dequeue:

for (vector<string>::iterator it = v.begin(); it != v.end(); ++it)
    *it = "";

for (dequeue<string>::iterator it = v.begin(); it != v.end(); ++it)
    *it = "";

iterator is a type member, which is a typedef defined inside a class.

iterators act like pointers: they must provide the basics that pointers provide (some do provide other features). In particular you must be able to get the beginning, v.begin() and one past the end with v.end(). Both of these functions return iterators. However it's only valid to dereference the returned value of begin(), because end() points past the end of the container.

The iterator has to define three functions to be useful: *, ++ and !=. With only those three we can do our entire iteration with any container, even ones like trees that aren't strictly sequential.

  • operator* returns a reference to the object to which the iterator is currently pointing. In a vector the iterator is actually a pointer, so it works without any code. In another class, say a linked list, the code has to do more work to figure out what the object being stored is, and return that.

  • operator++ advances the iterator to the next element. Again, how it actually happens depends entirely on what the container holds.

  • operator!= is important, because for some container types the idea of operator< doesn't really make sense. Hashmaps, for example, are entirely unordered, so we can only really test whether two iterators are not equal, not whether one is less than the other.

There is also a const_iterator, which gives you const references, and behaves exactly the same way conceptually.

Clang

Clang is a newer alternative to gcc. It was designed primarily by Apple to be a drop in replacement to gcc, but with a license apple prefers, and to support better debugging and static analysis. Apple uses it for XCode, and it's available on clic as well. So if you're getting crazy error messages you don't understand, you can try changing g++ to clang++ in your Makefile, and seeing if you get more easily interpreted error messages.

RECITATION 11

Back to Contents

Recitation 11

Smart Pointers

Or, If C++ is so smart, why do I have to remember to delete?

C is bare bones, so the fact that we have to manually malloc/free makes sense. But if everyone makes fun of C++ for including every language construct ever, why do I still have to call delete?

Well, good news! With SmartPtr you don't have to!

What is it? A C++ class that behaves like a pointer. That is, it points to a chunk of memory on the heap, and can be dereferenced to get to it. It will let us pass around pointers to objects, instead of the objects themselves.

Why?

  • We want automatic life-time management of heap-allocated objects.
  • We want value semantics without having to copy large objects.
  • In short, we want behavior that mimics Java references!

SmartPtr:

  • Is a light-weight handle for heap-allocated objects.
  • Manages the object life-time using reference counting.
  • Provides value semantics for the pointer, thus can be put into standard containers.

Reference Counting

Reference Counting is a way to automatically manage the lifetime of heap allocated objects.

Key Behaviors

The key behaviors that SmartPtr must provide are fairly simple:

  • It stores a real memory address of some type
  • It can be dereferenced to return that memory address when asked
  • When everybody is done with the object this smart pointer is pointing to, it will be deleted.

That last one is the key that drives the definition.

It means that a group of smart pointers all pointing to the same object need to coordinate their efforts.

As a result, let's think about the private data members that a SmartPtr must have. Obviously first is the underlying pointer we're wrapping - T *ptr;.

But there's a second component to a SmartPtr, the count of

Note: See 24-smartptr.pdf for the actual definition.

Diagram SmartPtr

SmartPtr definition

template class SmartPtr {

private:
T *ptr;  // the underlying pointer

int *count;  // the reference count

public:

// constructor
//
// - p is assumed to point to an object created by "new T(...)"
// - we hold the pointer and initialize ref count to 1.
//
//   note: explicit keyword
//   note: default argument
//
explicit SmartPtr(T *p = 0) 
{
    ptr = p;
    count = new int(1);
}

// copy constructor
//
// - copy the data members and increment the reference count 
//
//   note: member initialization syntax
//
SmartPtr(const SmartPtr<T>& sp)
    : ptr(sp.ptr), count(sp.count)
{
    ++*count;
}

// destructor
//
// - delete the underlying object if this was the last owner
//
~SmartPtr()
{
    if (--*count == 0) {
	delete count;
	delete ptr;
    }
}

// assignment operator
//
// - detach this SmartPtr from the underlying object and
//   attach to the object that sp is pointing to.
//
SmartPtr<T>& operator=(const SmartPtr<T>& sp)
{
    if (this != &sp) {
	// first, detach.
	if (--*count == 0) {
	    delete count;
	    delete ptr;
	}
	// attach to the new object.
	ptr = sp.ptr;
	count = sp.count;
	++*count;
    }
    return *this;
}

// operator*() and operator->() make SmartPtr class behave
// just like a regular pointer.

T& operator*() const { return *ptr; }

T* operator->() const { return ptr; }

// access to the underlying pointer for those cases when you
// need it.

T* getPtr() const { return ptr; }

// operator void*() makes "if (sp) ..." possible.

operator void*() const { return ptr; }

};

Reference Cycles

Imagine you have a doubly linked list. Each node points to the next node, but also to the previous node. You also have a List object, that points to the head and tail nodes.

Because SmartPtrs are awesome, you use SmartPtrs for all of the above head/next.

Unfortunately, there's a problem! When the list is destructed, it destructs its pointers to head and tail. That reduces their counts by 1. What we want to happen is that the head node is destructed, causing the node it's pointing to to be destructed.

However, the count of references to head node doesn't become 0, because there is still a reference to it -- the next node's previous pointer. So we have an orphaned cycle. One thing is pointing to the other, which is pointing back to the first. Thus neither will go to zero, and therefore won't be destructed.

This is called a reference cycle, which is the big problem with simple reference counting.

shared_ptr solves this with weak_ptr. Another option is to use some sort of garbage collection very occasionally to try to identify these objects (mark and sweep being a common basic one). Regardless, it needs to be addressed.

Warnings

SmartPtrs can't point to objects on the stack. Think about what would happen if you tried to make that happen?

An object managed by a SmartPtr must be managed exclusively by SmartPtr. If you mix a SmartPtr and a

Don't forget the above discussion on Reference Cycles.

shared_ptr

The C++11 standard includes a smart pointer template class (finally!) called "shared_ptr":

  • Works basically the same way as our SmartPtr, but more powerful.
  • Atomic reference counting for thread safety.
  • Can attach "weak_ptr", which does not participate in ref counting. These are used to break the cycles mentioned above.
  • Delete operation customizable.
⚠️ **GitHub.com Fallback** ⚠️