⊛ Summary
This summary is the "Quick Introduction to C" in the CS2100 website based on a write up by Colin Tan.
Introduction
This document gives a very quick introduction to the C Programming Language. It is assumed that the reader is already reasonably proficient in programming methodology, and hence this document does not explain basic programming concepts like data-types and functions. It will however explain concepts that are unique to C, like pointers. The objective of this document is not to teach how to program in C, but to make the relevant sections of the CS2100 lecture notes comprehensible to programmers who are unfamiliar with the C programming language.
Data Types in C
Unlike languages like Python and JavaScript, C is a strictly typed language. C also strictly requires that all variables are declared before being used. The code fragment below shows how to declare a signed integer and a floating point number:
Variable Declaration | |
---|---|
1 2 |
|
Numerical Data Types
The common C numerical data types are:
Type | Meaning | Size | Encoding | Range |
---|---|---|---|---|
char |
Character/signed byte | 8 bits/1 byte | 2s complement | -128 to 127 |
unsigned char |
Character/unsigned byte | 8 bits/1 byte | Unsigned | 0 to 255 |
short |
Signed short integer | 16 bits/2 bytes | 2s complement | -32768 to 32767 |
unsigned short |
Unsigned short integer | 16 bits/2 bytes | Unsigned | 0 to 65535 |
int |
Signed integer | 32 bits/4 bytes | 2s complement | -2,147,483,648 to 2,147,483,647 |
unsigned int |
Unsigned integer | 32 bits/4 bytes | Unsigned | 0 to 4,294,967,295 |
long 1 |
Signed long integer | 64 bits/4 bytes | 2s complement | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
unsigned long |
Unsigned long integer | 64 bits/4 bytes | Unsigned | 0 to 18,446,744,073,709,551,615 |
float |
Signed single-precision floating point | 32 bits/2 bytes | 32-bit IEEE 754 | ±1.2e-38 to 3.4e+38 |
double |
Signed double-precision floating point | 64 bits/4 bytes | 64-bit IEEE 754 | ±2.3e-308 to 1.7e+308 |
Alphanumeric Data
All data in C is (or can be) represented as integers.
Characters are represented by 8-bit "char
" integers based on the ASCII table.
Strings are then represented as:
- Array of
char
. - Terminated by a null character (i.e.,
'\0'
or0
).
Boolean Values
C does not have Boolean values.
Instead, true
and false
are simply another name for 1
and 0
respectively:
Boolean | |
---|---|
1 2 |
|
However, any value that are non-zero area also treated as true. Hence:
Non-Zero vs Zero | |
---|---|
1 2 3 4 5 6 |
|
In C we do not assume that "true" is always equivalent to 1.
It can be any non-zero value.
However, the keyword true
is always 1.
This can cause problems.
The statement below shows the correct way to do comparisons:
Correct Comparison | |
---|---|
1 2 3 |
|
BUT:
Incorrect Comparison | |
---|---|
1 2 3 4 5 |
|
NOTE:
Keyword Comparison | |
---|---|
1 2 3 4 |
|
C Statements
Statement Blocks
Similar to JavaScript (but different from Python), C statement blocks are marked with curly brackets (also called braces shown as { ... }
).
For example, the following statement forms a block in C:
C Block | |
---|---|
1 2 3 4 5 |
|
Unlike Python, indentations in blocks is optional, but should be included for readability.
Also unlike JavaScript and Python, all C statements must be terminated by a semicolon (i.e., ;
).
In fact, carriage returns in C are also optional.
C separates statements solely with the semicolon.
So the block above can be rewritten as:
C One-Line Block | |
---|---|
1 |
|
Warning
Just because you could, doesn't mean you should. Please do write your code such that it is easy to read.
Because of this unique property, it is possible to write very fancy looking C source code. In fact, there are "International Obfuscated C Code Competition" held almost every year to celebrate creativity in writing C programs. The 2019 winner is shown below. This program converts text to sound using fonts as a spectogram.
Let me reiterate the warning again:
Warning
Just because you could, doesn't mean you should. Please do write your code such that it is easy to read.
Iterations
C supports several types of iterations with slight differences:
For-Loop
The most basic C iteration is the for loop. It consists of 3 parts: An initializer, a continuation condition, and a modification operation, separated by semi-colons. To count from 0 to 9 we would do:
For-Loop (Count 0 to 9) | |
---|---|
1 2 3 |
|
Due to its flexibility, the for statement is very powerful; we can for example count downwards from 9 to 0:
For-Loop (Count 9 to 0) | |
---|---|
1 2 3 |
|
Each part is optional; If we don’t want to initialize i, we can do:
For-Loop (Optional Init) | |
---|---|
1 2 3 |
|
You can also implement a loop that counts infinitely from any initial value:
For-Loop (Optional Init + Condition) | |
---|---|
1 2 3 4 |
|
If we had a string mystr
we can count how many characters are in the string using:
For-Loop (String) | |
---|---|
1 |
|
When this for loop ends ctr
will contain the number of characters in mystr
.
Lastly the for loop can also be used for infinite loops, simply by leaving out every part:
For-Loop (All Optional) | |
---|---|
1 2 3 |
|
While-Loop
The while-loop works similarly to the for loop, except that the while
statement itself contains only the continuation condition; initialization and update are done separately.
Our "string count" operation would be written as:
While-Loop (String) | |
---|---|
1 2 3 4 |
|
Since any non-zero value is true, we can do an infinite loop using while by doing:
While-Loop (Infintie Loop) | |
---|---|
1 2 3 |
|
Do-While-Loop
The do-while-loop is fairly unique to C. Since the continuation condition is tested only at the end of the body, this means that the body will always be executed at least once:
Do-While-Loop (String) | |
---|---|
1 2 3 |
|
Here ctr
will be incremented as least once, even if mystr[0]
is 0.
Conditional Statements
If-Else Statement
The if-else statement in C is fairly straightforward:
If-Else Statement | |
---|---|
1 2 3 4 5 |
|
C however does not have an explicit elif
(i.e., else if
) statement, and this must be handled with:
Else If | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
Conditional Expression
This is also called ternary if-else operation. Just as in Javascript, we can use the ternary if-else operation:
Conditional Expression | |
---|---|
1 |
|
So if we did:
Conditional Example | |
---|---|
1 |
|
The y
variable would be set to 3 if x
is greater than 0, and 2 otherwise.
Note that this is an "expression" and not "statement". As such, this can be used as a condition to an if-else statement.
Switch Statement
The C Programming Language has the switch statement, which can handle multiple choices without having to deeply nest if-else statements. The statement takes the form:
Switch Statement | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The switch statement only works with integer variables (including char
, which if you recall is a 1-byte integer).
You also need a break
statement after each case, to force execution to exit from the switch statement, instead of falling through all the other cases.
The default
keyword is used to catch all other cases that are not explicitly stated.
You can use switch to implement a menu system:
Menu System | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Pointers
The idea of pointer variables is arguably the most difficult concept in C to grasp. Every memory location in a computer is indexed with an address (also called a "memory location"). All variables in C must be stored in memory, and a pointer variable simply stores the address of another variable. So if we had:
C Pointer | |
---|---|
1 |
|
This declares a pointer variable x
, and it will point to another variable of type int
.
Let’s suppose variable y
is stored in location 0x5004.
The table below shows memory contents from addresses 0x5000 to 0x5008.
Address |
Value | Description |
---|---|---|
0x5000 | 55 | Variable z |
0x5001 | ||
0x5002 | ||
0x5003 | ||
0x5004 | 72 | Variable y |
0x5005 | ||
0x5006 | ||
0x5007 | ||
0x5008 | ||
... |
Notice firstly that every memory location has an "address", and stores "values".
The "Description" column tells us what the memory holds, but is not actually stored anywhere – it is there just for our information.
Also notice that each variable occupies four memory locations; this is because each int
is 32-bits long, which is four bytes, and each address refers to an individual byte in memory (byte-addressable memory).
Now if we did:
Address | |
---|---|
1 |
|
The &
operator, called the "address-of" operator, returns the address of y
.
This is stored into x
, which is a pointer variable and stores addresses of other variables.
Now if we did:
Address Printing | |
---|---|
1 |
|
The program will tell us 0x5004, which is the address of y
.
Now what can we do with this? We could make two address variables point to exactly the same variable:
Aliasing | |
---|---|
1 2 |
|
This is very useful in "call-by-reference"2 parameter passing which we will see shortly.
Now that x
(and a
) point to y
, can we use them to access (and change) y
’s value?
Yes, using the "de-referencing" operation.
In a fit of ingenuity, the designers of C decided to use the same *
operation (the one used to declare pointer variables) to also access the value pointed to.
So if we did:
Dereferencing | |
---|---|
1 |
|
We would get:
Dereferencing Output | |
---|---|
1 |
|
This is the value that is in y
.
Now remember that a is also pointing to y
.
If we do:
Aliasing Update | |
---|---|
1 |
|
Our memory will now look like this:
Address |
Value | Description |
---|---|---|
0x5000 | 55 | Variable z |
0x5001 | ||
0x5002 | ||
0x5003 | ||
0x5004 | 123 | Variable y |
0x5005 | ||
0x5006 | ||
0x5007 | ||
0x5008 | ||
... |
This is because C uses the *
de-referencing operator to access a
, getting the address 0x5004, then going to that address and storing 123
.
Functions
As in other languages, a function in C is a unit that takes inputs, performs some sort of transformation on the input and produces an output. However as with other languages, a function may not necessary take inputs (called arguments) and may not necessary produce a value.
A C function is declared as follows:
C Function Syntax | |
---|---|
1 2 3 4 |
|
The function below returns the sum of two integers:
Sum | |
---|---|
1 2 3 |
|
The function below halves the argument:
Sum | |
---|---|
1 2 3 |
|
The function below doesn't take any arguments nor does it return a value:
Sum | |
---|---|
1 2 3 |
|
Call-by-Value vs Call-by-Pointer
All arguments are "passed-by-value" to functions (i.e. in C, all function arguments are "call-by-value"). To understand what this means, let’s look at our earlier sum function:
Sum | |
---|---|
1 2 3 |
|
Now let's declare two variables x
and y
and call sum
with them:
Sum Call | |
---|---|
1 2 |
|
The diagram on the right shows what happens.
Here 5 is copied from the argument x
into the parameter a
, and 6 is copied from the argument y
into the parameter b
.
This means that a
and b
are second copies of x
and y
.
This has implication.
Supposed we modify a
within the body:
The picture below shows what happens:
Notice that while a is changed to 10, x
remains as 5.
Therefore we cannot use this as a means of passing back values through the parameters.
This is where call-by-pointer comes in. We rewrite our function as:
Sum (Call-by-Pointer) | |
---|---|
1 2 3 4 |
|
Now we call it with:
Sum Call (Call-by-Pointer) | |
---|---|
1 |
|
What happens now is that the ADDRESSES of x
and y
are copied "by value" into a
and
b
.
When we de-reference a
(which points to x
) in this statement:
Dereferencing | |
---|---|
1 |
|
C will look at the address in a
, go to that address and write in 10.
Since a
is pointing to x
, this causes x
to be modified to 10.
-
This is actually dependent on the architecture in question but is guaranteed to be at least as wide as integer. In some architectures, both
long
andlong long
are the same and in other architectureslong long
is twice as long aslong
. ↩ -
This is not a true call-by-reference since it is only available in C++. ↩