String and Character Manipulation - JohnHau/mis GitHub Wiki

Objectives While engaging with this module, you will...

see the variety of differences between C++ strings and C ntca learn how each type stores and manipulates whole words explore the storing and manipulating all the way down to the individual letters Introduction As you have now seen, there are two ways to handle strings of characters in C++. What we have just discussed is the old C-style strings I like to call null-terminated character arrays. The existence of the null character as a data delimiter is the crux of the manner in which a programmer successfully manipulates these types of arrays. The second type is the type you learned at the beginning of the semester. At the time we called that type string and you were told to treat it as a basic, built-in C++ type. Well, it really isn’t. It is more formally known as the standard string class. Since you don’t yet know what a class is, I cannot fully explain what they are “made of”. But that time will come soon enough.

Whether you choose to use the standard string class or the null-terminated character arrays is a matter of personal preference – unless the situation dictates that you use one over the other. Thus, you should be prepared to use either. If you keep in mind that both of these types are some form of an array of characters, you can manipulate their contents in much the same manner. In this lesson, I will talk about some of the character manipulation tools available to you. But first, I need to discuss each of the aforementioned types in a little more detail. I’ll begin with the standard string class.

The Standard String Class (strings) We will usually refer to these things as strings, and the other type as c-strings or ntcas. One fact that you want to keep in mind is that they are indeed of different type, so you cannot expect the compiler to understand you if you start to mix them. There is a set of functions available to the programmer for the manipulation of standard strings. Some compilers will have them and some will not. In order to access them in the GNU compiler, you will need to include the system file string. Thus, when using more specialized string class functions, put the preprocessor command

#include

at the top of your program with other system includes. I will leave it up to you to discover the different functions allowing you to manipulate standard strings by looking them up in the text or on-line. They are numerous and varied. But remember, the GNU compiler may not have them available and you will have to work around them.

string name = “Clayton”; cout<<name.length(); // 7 is output to the screen There is one particular issue that needs special attention with strings. As you might have experienced and were warned about, reading into a string variable using a cin statement will only read up to the first space. This can be quite inconvenient indeed. Now we will learn how to get around that. (We will deal with the same issue using c-strings, but the fix is a tiny bit different.) We need to employ a special function and a special way to call that function that I cannot fully explain at this juncture. Here’s how to use it. The name of the function is getline. The syntax is

getline(cin,string_var,delimiter_character);

This line of code will read a string of characters from the keyboard (cin) until the delimiting character is reached and that string is stored in string_var.

string name; getline(cin,name, ‘\n’); // user enters Clayton Price cout<<name; // outputs Clayton Price The delimiter character defaults to the newline character, \n. Thus, the above getline statement is equivalent to

getline(cin,name);

The delimiting character is discarded when read.

Very Important You must understand a subtlety about using getlines and cin statements. Remember: cin will always leave a newline character, \n, in the stream, while getline does not. And how does this affect you? Consider this code:

string name; int n; cout<<”enter a number: “; cin>>n; cout>>”enter your name: “; getline(cin,name); return 0; Suppose the first prompt is answered with 5 being entered. The variable n takes the value 5. What happens next can be mystifying if you are not aware of what is in the stream. The second prompt is printed to the screen and in an instance, before you have a chance to blink, the program ends. Why?! Remember, cin left the newline character on the stream. After the second prompt was issued, the getline function started to read the stream from the keyboard, read the newline, considered itself done reading from the stream, discarded the newline and ended the program. Thus, you see that the problem was information (a newline character in this case) was left on the stream. In order to make this code function correctly, you must be sure that the stream is empty before using a getline call. This can be done in more than one way, but I think the simplest is to use the ignore function.

cin.ignore(int_value, delim_char);

This function call will take characters from the stream and discard them until int_value characters are discarded or delim_char is read, whichever comes first. If the later, then the delim_char is also discarded.

cin.ignore(500, ‘\n’); // reads and discards up to 500 chars from the stream

Include this line of code before using a getline and probably your problems will be solved.

Null-Terminated Character Arrays (C-strings) Functions to deal with ntcas have already been discussed. Anything that hasn’t already been stated you can build yourself. But, once again, we have the problem of reading a string of characters into a ntca using a cin statement and having it read only to the first space. And, once again, we will solve the problem using a call to a getline function. Notice I said “a getline” function….not “the getline”. There are two versions of the function, one for standard strings and one for ntcas. For ntcas the syntax is

cin.getline(ntca, max_num_chars);

The dot in the middle of this function call is the same dot used to access members of a struct. However, I cannot relate to you at this time exactly what is going here. But, after a few more lessons, you will completely understand. Also, you will have the very same problem with this getline and the \n character as you did with the other getline. You will handle it exactly the same way.

char address[80]; // assumes address is 79 char (or less) + null cout<<”enter address: “; // suppose enter: 245 N. Oak St. apt. 1B cin.ignore(500,’\n’); // be sure to clear the stream cin.getline(address, 79); // keeping at least one space for null char cout<<address; // -> 245 N. Oak St. apt. 1B

image

Character Manipulation Using the ntca address from the example in the last section, I will demonstrate some of the character manipulation functions.

There are several built-in functions that can help you manipulate individual characters. This can be important when trying to understand the contents of a ntca or string. You may have to #include in order to use these. (I say “may” because you can never know until you ask just what is necessary for a particular compiler. It isn’t necessary with our compiler.) You can find a more complete list of these functions in most C++ text books or on-line. Here are some of them:

toupper(char) returns the uppercase of arg sent toupper('a'); -> 'A' tolower(char) similar isupper(char) returns bool: true if uppercase isupper('a'); -> false islower(char) similar isalpha(char) similar isdigit(char) similar ispunct(char) returns bool: true if punctuation ispunct('!'); -> true isspace(char) returns bool: true if whitespace – space, newline, tab int i = 0, count = 0;

while (ntca[i] != ‘\0’) { if (ispunct(ntca[i])) count++; i++; }

cout<<count<<endl; If we run this code on the ntca address from the last section, the output would be 3. Change the function call to isspace(), the output will be 4.

int i = 0;

while (ntca[i] != ‘\0’) { ntca[i] = toupper(ntca[i]); i++; }

cout<<ntca<<endl; And if we run this code on the ntca address from the last section, the output will be: 245 N. OAK ST. APT. 1B

So you see that you can used these functions to “tear down” a string and discover what is in it, manipulate it, change it, etc. You should not overlook the ease with which you can write these functions yourself. Let’s see.

Write a function that will accept a char and return true if it is a digit, false otherwise.

bool IsDigit (const char input) { bool digit = true; if (input >= 48 && input <= 57) digit = false; return digit; }

Challenge: write this function as one line of code. Answer is at the end of this page.

Input/Output Character Manipulation There are functions built in to the compiler that allow individual character input and output. Let’s take a look. The functions I want to mention here are get, peek, putback, put, and ignore. They are fairly simple to understand. I will give most attention to the function get(). The syntax is

cin.get(char_variable);

The parameter is a reference, so what is sent will be changed by the function. get() will pull the next character off the input stream and instantiate the char_var sent as an argument with its value. Thus, you have the ability to input information from the input stream character-by-character.

char next; cout<<”enter your poem: “;

do { cin.get(next); cout<<next; } while (next != ‘\n’); This code will simply echo to the screen what has been input at the keyboard. It is equivalent to

char poetry[500]; cout<<”enter your poem: “; cin.getline(poetry,499); cout<<poetry; except that the second is limited to 500 character poems and the first isn’t limited at all. It is also different because the first is reading character-by-character while the second is reading an entire line.

At this juncture I want to emphasize to you that you have now learned three ways to read in information from the keyboard:

line-by-line using getline() word-by-word using cin>> character-by-character using get() Each of these methods has an advantage, but it depends on the requirements of the problem. If you need to process each character, then read char-by-char. If you need to process each word, then use cin>>.

The put() function is actually fairly useless.

cout.put(char_var);

is equivalent to cout<<char_var;. End of discussion!

The putback() function will allow you to put a character back into the input stream:

cin.putback(char_var);

The peek() function will allow you to know what the next character in the input stream is without extracting from that stream.

char_var = cin.peek();

All these functions will allow a certain degree of manipulation of the input stream and its contents for single characters.

Answer to challenge:

bool IsDigit (const char input) { return (input >= 48 && input <= 57); }

⚠️ **GitHub.com Fallback** ⚠️