ASCII Code
Representation of characters is difficult because there is no apparent ordering for most of the characters. How do we compare even compare $ and # sensibly? As such, any kind of representation of characters are simply based on convention.
The convention that we use (and is used by C as well as many other modern programming languages) is the ASCII convention. ASCII stands for the American Standard Code for Information Interchange. The representation of characters uses 7 bits plus there is one bit used to check for simple errors called parity bit. For simplicity, we will often ignore parity bit and simply append 0 in front. So, unless specified explicitly, we do not use parity bit.
Parity Bit
Parity bit is used to check if there is a simple error in the representation. This is done because connection may be unstable and bit error may occur. There are two kinds of parity bit called the odd parity bit and the even parity bit.
- Odd: In this scheme, the total number of bit 1 must be odd numbers after adding the parity bit must be odd.
- Even: In this scheme, the total number of bit 1 must be odd numbers after adding the parity bit must be even.
For instance, if our message is 1001101 and we use odd parity bit, then we append a bit 1 to the end as the parity bit and we send 10011011. If the receiver received an even number of bits, the receiver knows that there is definitely an error and can discard the entire message.
Alternatively, we may also append to the front of the bit. For instance, if our message is 0110101 and we use even parity bit, then we send 00110101.
Parity bit is a primitive error checking scheme as the receiver does not know which bit is erroneous (and hence, cannot correct it but simply discard the message) and it may be oblivious to an error where the mistake occur in more than one bits.
ASCII Table
Example:
- A: 1000001
- MSBs = 100
- LSBs = 0001
- As
int
: 65
- r: 1110010
- MSBs = 111
- LSBs = 0010
- As
int
: 114
You do not really have to remember the entire ASCII table. Most of the time, what we need is the relative ordering:
digits < uppercase < lowercase
For best result, you may want to remember the base number of certain sequences:
- Digits: 0 is 48.
- Uppercase: A is 65.
- Lowercase: a is 97.
Char and Int
In C, small integers (between 0 and 127, inclusive of both) and characters are "somewhat" interchangeable.
CharAndInt.c | |
---|---|
1 2 3 4 5 6 7 8 |
|
Quick Quiz
- What is the binary representation of F?
- What is the value of F as
int
?
- 1000110
- 70
Extended Unicode
As you may have noticed, ASCII table only has latin alphabets. It is missing a lot of other characters that are used worldwide. To accommodate all the characters, we have the extended unicode characters for which the first 128 comes from ASCII table.
We will not be using extended unicode. But note that unicode includes 👉 emoji such as these eyes 👀.