Strings
A string is a representation of words, sentences, etc. Since they are sequences of characters, a string is represented as an array of characters. However, it is a special kind of array of characters that has to satisfy an additional constraint.
Definition
String
A string has two specify two rules:
1. An array of character.
2. Ends with a null character (i.e., '\0'
corresponding to binary 00000000 or simply 0).
Based on the definition above, not all array of characters are string if the array does not satisfy the second condition.
A String
The following arrays are strings:
char code[7] = {'c', 's', '2', '1', '0', '0', '\0'}
char name[] = {'C','o','m','p',' ','O','r','g',0}
char who[] = {65, 100, 105, 0}
Note the use of integer 0 on the second example as opposed to the null character '\0'
in the first example.
This is accepted because of implicit conversion from 4 bytes int
to 1 byte char
since they interchangeable for small integers.
This is taken to the extreme in the third example.
Not A String
The following arrays are not strings:
char code[7] = {'c', 's', '1', '0', '1', '0', 'e'}
int name[] = {'C','o','m','p',' ','O','r','g',0}
int who[] = {65, 100, 105}
The first example violates the second rule because it does not end with a null character. The second example violates the first rule because it is not an array of characters. We take this to the extreme again with the third example that violates both rules.
Using the definition, we can then declare a string as an array of characters with a certain maximum size1.
We can then either initialise it during declaration using an array declaration {elem, elem, ...}
or assign each character to an index on the array.
Luckily for us, C provides a simpler way to initialise a string with a syntax you are familiar with, the double-quotes.
No Single-Quote String
If you are coming from Python or JavaScript, you may have developed a kind of muscle-memory to simply use single-quotes because then you do not have to press the Shift button. If that is the case, you need to undo that muscle-memory for C.
CreateString.c
CreateString.c | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
In the example above, the string str
is of particular interest.
We allocated 6 spaces on the array, but only use 4 of those spaces.
What would the visualisation look like?
In memory, we would see something like the image on the right.
Input/Output
The strings that we have declared above are currently fixed according to the program text. What if we want to read user input? And a similar problem is, how do we print a string?
The format identifier for string is %s
.
So now, we can use both scanf()
and printf()
functions to read string and print the string respectively.
StrIO.c
StrIO.c | |
---|---|
1 2 3 4 |
|
This would immediately pose a problem. How do we know we have allocated enough space for the string? Remember, string is an array of character and an array have to be declared with a maximum size. However, the number of characters being read may be larger than this maximum size.
As a side note, the size of the array should be the maximum number of characters to be read +1.
The +1 is important because we need to accommodate for the terminating null character '\0'
which takes up one slot on the array.
For a safer input reading, C provides an alternative function to read user input that specifies the maximum size of the array to store.
This function also comes with a corresponding function to print that automatically adds a newline '\n'
.
String I/O
fgets Prototype | |
---|---|
1 |
|
puts Prototype | |
---|---|
1 |
|
Important Notes
-
fgets()
str
is the string to store the characters (i.e., array/pointer of characters).n
is the maximum size ofstr
. The number of characters to be read isn-1
.stream
is the input stream. For keyboard, it is the standard inputstdin
.- The return value of the type
char*
is exactly the samestr
parameters. We typically ignore this result. - Note that the function also stop reading input when a newline is read. This newline is added to the string.
-
puts()
str
is the string to be printed.- The return value is a non-negative integer if the operation is successful. Otherwise, the function returns EOF. We typically ignore this result.
- Note that a newline character is automatically added to the printed output.
gets(str)
There is another function called gets(str)
to read a string interactively.
However, due to security reason, we avoid this and use fgets()
function instead.
Due to the newline character '\n'
potentially being read by the function fgets()
, we will need to remove this newline character if it is actually being read.
To do that, the typical procedure is to check the last character being read.
If the last character is '\n'
, then we replace it with '\0'
to terminate it here instead.
In the template below, we use the function strlen(str)
to find the number of characters of the string.
This will be explained when we talk about string functions.
For now, it is sufficient to note that the function strlen(str)
returns the number of characters in the string str
excluding '\0'
.
Reading with fgets | |
---|---|
1 2 3 4 |
|
Differences Between scanf/printf and fgets/puts
StringIO1.c | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
StringIO2.c | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Try the examples above with the following input:
My book
You will find the difference in behaviour for the two programs. The outputs are shown below:
Input is 'My book' | |
---|---|
1 |
|
Input is 'My book' | |
---|---|
1 2 |
|
Newlines
Notice how we have two newline characters when we use the combination of fgets()
and puts()
.
This is because the function fgets()
also reads the newline characters.
As such, the string being read is already str = "My book\n"
.
When we print using puts()
, another newline is then added.
Thus, the actual string being printed is "My book\n\n"
.
This is the source of the two newline characters being printed.
Remove Vowels
Write a program RemoveVowels.c
to remove all vowels in a given input string.
You may assume that the input string has at most 100 characters.
Sample Run | |
---|---|
1 2 |
|
RemoveVowels.c | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
- To use the function
strlen()
, we need to include#include <string.h>
. - To use the function
toupper()
, we need to include#include <ctype.h>
.
String Functions
C provides a library of string functions.
To use them, we must include the header #include <string.h>
.
Some of the commonly used string functions are listed below:
strlen(s)
(string length)- Returns the number of characters in
s
.
- Returns the number of characters in
strcmp(s1,s2)
(string compare)- Compare the ASCII values of the corresponding characters in strings
s1
ands2
pairwise. - The return value should satisfy the following condition:
- Negative: If string
s1
is lexicographically less than than the strings2
. - Positive: If string
s1
is lexicographically greater than than the strings2
. - Zero: If string
s1
is lexicographically equal to the strings2
.
- Negative: If string
- Compare the ASCII values of the corresponding characters in strings
strncmp(s1,s2)
(string compare up to n)- Compare the first n characters of string
s1
and strings2
.
- Compare the first n characters of string
-
strcpy(dst,src)
(string copy)- Copy the string pointed to by
src
into an array pointed to bydst
. - The return value is
dst
. - Important Note
-
The following assignment statement does not work.
Invalid Assignment 1 2
char name[10]; name = "Matthew";
The reason is the same as the reason that we cannot initialise an array after declaration. Since a string is still an array, this is also not allowed. * If the string to be copied is too long, the copying will simply overwrite whatever memory is present. This may cause undefined behaviour.
Too Long 1 2
char name[10]; strcpy(name, "A very long name");
The visualisation will look like the image below.
-
- Copy the string pointed to by
-
strncpy(dst,src)
(string copy up to n)- Copy the first n characters of string pointed to by
src
todst
.
- Copy the first n characters of string pointed to by
Importance of Null Character
The two rules in the definition of a string above are strict and they affect all the string functions above.
In particular, the string functions as well as printf()
will not work properly without it.
In many case, a string that is not properly terminated with '\0'
will result in illegal access of memory.
To make it clearer, we will describe the functions above except the up to n using pseudo-codes. This will also make clear certain (possibly) weird return value in string comparisons.
Pseudo-Codes of String Functions
String Length | |
---|---|
1 2 3 4 5 6 |
|
String Compare | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
String Length | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
WithoutNullChar.c
WithoutNullChar.c | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
If you simply run it, Clang will actually initialise the array with all 0. Since 0 is the null character, then you will not be able to see the problem. To do so, you need to compile it with GCC:
- Click on "Shell" tab.
- Compile with GCC using
gcc main.c
. - Execute teh code using
./a.out
.
You may see the following output (results may vary).
Possible Output | |
---|---|
1 2 |
|
-
The maximum size is because array declaration requires us to specify the maximum size. ↩