Free Hosting

C-style strings


Under regular C (and hence also C++), it is possible to use arrays to represent strings. A string is a sequence of chars that are interpreted as a piece of text. You have already seen string literals:
1
cout << "This is a string literal";
In C and C++, strings are typically represented as char arrays that have a null terminator. A null terminator means that the string ends with a ‘\0′ character (which has ASCII code 0). Arrays that are null terminated in this manner are often named using the Hungarian Notation prefix “sz”.
To declare a C-style string, simply declare a char array and assign it a value:
1
char szString[] = "string";
Although “string” is only 6 letters, this actually declares an array of length 7. The following program prints out the length of the string, and then the ASCII values of all of the characters:
1
2
3
cout << sizeof(szString) << endl;
for (int nChar = 0; nChar < sizeof(szString); nChar++)
    cout << static_cast<int>(szString[nChar]) << " ";
This produces the result:
7
115 116 114 105 110 103 0
That 0 is the ASCII code of the null terminator that has been appended to the end of the string.
Just like with normal arrays, once an array is declared to be a particular size, it can not be changed. Our szString above is of length 7 — which means it can fit 6 chars of our choice and the null terminator. If you try to stick more than 6 chars in the array, you will overwrite the null terminator and the CPU won’t know where the string ends. If you try to print a string with no null terminator, you’ll not only get the string, you’ll also get everything in the adjacent memory slots until you happen to hit a 0.
When declaring strings in this manner, it is always a good idea to use [] and let the compiler calculate the size of the array. That way if you change the string later, you won’t have to manually adjust the size.
It is important to realize that a single char (eg. ‘a’) is typically only allocated one byte, but the equivalent string (eg. “a”) is allocated two bytes — one for the char, and one for the null terminator.
Since C-style strings are arrays, you can use the [] operator to change individual characters in the string:
1
2
3
char szString[] = "string";
szString[1] = 'p';
cout << szString;
This snippet prints:
spring
One important point to note is that strings follow ALL the same rules as arrays. This means you can initialize the string upon creation, but you can not assign values to it using the assignment operator after that!
1
2
char szString[] = "string"; // ok
szString = "rope"; // not ok!
This would be the conceptual equivalent of the following nonsensical example:
1
2
int anArray[] = { 3, 5, 7, 9 };
anArray = 8; // what does this mean?
Buffers and buffer overflow
You can read text into a string using cin:
1
2
3
char szString[255];
cin >> szString;
cout << "You entered: " << szString << endl;
Why did we declare the string to be 255 characters long? The answer is that we don’t know how many characters the user is going to enter. We are using this array of 255 characters as a buffer. A buffer is memory set aside temporarily to hold data. In this case, we’re temporarily holding the user input before we write it out using cout.
If the user were to enter more characters than our array could hold, we would get a buffer overflow. A buffer overflow occurs when the program tries to store more data in a buffer than the buffer can hold. Buffer overflow results in other memory being overwritten, which usually causes a program crash, but can cause any number of other issues. By making our buffer 255 charaters long, we are guessing that the user will not enter this many characters. Although this is commonly seen in C/C++ programming, it is poor programming.
The recommended way of reading strings using cin is as follows:
1
2
3
char szString[255];
cin.getline(szString, 255);
cout << "You entered: " << szString << endl;
This call to cin.getline() will read up to 254 characters into szString (leaving room for the null terminator!). Any excess characters will be discarded. In this way, we guarantee that buffer overflow will not occur.
Manipulating C-style strings
C++ provides many functions to manipulate C-style strings. For example, strcpy() allows you to make a copy of a string.
1
2
3
4
char szSource[] = "Copy this!";
char szDest[50];
strcpy(szDest, szSource);
cout << szDest; // prints "Copy this!"
However, strcpy() can cause buffer overflows! In the following program, szDest isn’t big enough to hold the entire string, so buffer overflow results.
1
2
3
4
char szSource[] = "Copy this!";
char szDest[4];
strcpy(szDest, szSource); // buffer overflow!
cout << szDest;
It is better to use strncpy(), which takes a length parameter to prevent buffer overflow:
1
2
3
4
5
char szSource[] = "Copy this!";
char szDest[50];
strncpy(szDest, szSource, 49); // copy at most 49 characters (indices 0-48)
szDest[49] = 0; // ensures the last character is a null terminator
cout << szDest; // prints "Copy this!"
Other useful functions:
strcat() — Appends one string to another (dangerous)
strncat() — Appends one string to another (with buffer length check)
strcmp() — Compare two strings (returns 0 if equal)
strncmp() — Compare two strings up to a specific number of characters (returns 0 if equal)
strlen() — Returns the length of a string (excluding the null terminator)
Here’s an example program using some of the concepts in this lesson:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Ask the user to enter a string
char szBuffer[255];
cout << "Enter a string: ";
cin.getline(szBuffer, 255);
 
int nSpacesFound = 0;
// Loop through all of the characters the user entered
for (int nChar = 0; nChar < strlen(szBuffer); nChar++)
{
    // If the current character is a space, count it
    if (szBuffer[nChar] == ' ')
        nSpacesFound++;
}
 
cout << "You typed " << nSpacesFound << " spaces!" << endl;
std::string
It is important to know about C-style strings because they are used in a lot of code. However, we recommend avoiding them altogether whenever possible!
A better idea is to use the string class in the standard library (std::string), which lives in the string header. std::string lets you work with strings in a way that is much more intuitive. You can assign strings to them using the assignment operator and they will automatically resize to be as large or small as needed.
Here is a quick example using std::string:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <string> // for std::string
#include <iostream>
 
int main()
{
    using namespace std; // for both cout and string
    cout << "Enter your name: ";
    string strString;
    cin >> strString;
    cout << "Hello, " << strString << "!" << endl;
 
    cout << "Your name has: " << strString.length() <<
            " characters in it" << endl;
    cout << "The 2nd character is: " << strString[1] << endl;
 
    strString = "Dave";
    cout << "Your name is now " << strString << endl;
    cout << "Goodbye, " << strString << endl;
 
    return 0;
}
One extremely useful function to use with std::string is getline(). This allows you to read an entire string in, even if it includes whitespace:
1
2
3
4
5
6
cout << "Enter your full name: ";
 
string strName;
getline(cin, strName);
 
cout << "You entered: "<< strName <<endl;
For example:
Enter your full name: John Smith
You entered: John Smith
The nice thing about std::string is that you don’t have to guess how large the input string is likely to be in advance!
We will talk more about std::string in future lessons. But feel free to experiment with it in the meantime.

0 comments:

Blogger Template by Clairvo