6.2 Comparing strings

Before we can take a closer look at how to compare strings in Turing, we need to consider how characters are stored in memory. All data on computers, whether it be mp3s, graphics or an essay are stored internally as numbers using the binary system (that's where the adjective digital comes from). Electrical signals are used to represent all those 1s and 0s. So characters are also stored as numbers. Computers use some kind of code to represent each character. There are a number of different systems depending on what kind of computer and languagae you are using. IBM mainframe computers use a system known as EBCDIC. Turing and most personal computers use ASCII. Recently a system known as Unicode is being more widely used, most notably by the Java programming language. Let's look at the ASCII system since that's what we'll see in Turing.

Here is a table that summarizes the ASCII codes for the characters you can type on the keyboard.

      0  1  2  3  4  5  6  7  8  9
30  |          !  "  #  $  %  &  '
40  | (  )  *  +  ,  -  .  /  0  1
50  | 2  3  4  5  6  7  8  9  :  ;
60  | <  =  >  ?  @  A  B  C  D  E
70  | F  G  H  I  J  K  L  M  N  O
80  | P  Q  R  S  T  U  V  W  X  Y
90  | Z  [  \  ]  ^  _  `  a  b  c
100 | d  e  f  g  h  i  j  k  l  m
110 | n  o  p  q  r  s  t  u  v  w
120 | x  y  z  {  |  }  ~
The first character actually in the table is code 32 which is a space character. Codes 30 and 31 are for characters the keyboard doesn't generate. The last code is 126. The last 3 in the chart are characters they keyboard doesn't generate. As some examples of reading this table, the letter 'A' has a code of 65 and the letter 'a' has a code of 97. Notice the codes for the upper case letters are all smaller than the codes for the lower case letters. Notice also that the codes for the digits are consecutive and come before both the lower and upper case letters. This will help us figure out how to order strings.

You do not need to memorize these codes. You just need to know some basic properties about the code as mentioned above. If you need to know a particular code you can look it up. You can also use some Turing functions.

ord

The ord function will tell you the ASCII value for any character or a string of length 1.

chr

The chr function will tell you the character that is represented by a particular code. ASCII values are only defined from 0 to 255. If you pass chr an int less than 0 or greater than 255 you will get an error.

Now we can look at how strings are ordered. The same approach is used that would be used to find a word in a dictionary with one exception. Upper case and lower case letters are different and upper case letters come first. The ASCII code is used to determine the order of characters. It is often a good idea to convert strings to all upper case or all lower case before comparing them. We'll look at how to do this a bit later in this section.

To convert strings strings to all lower case letters we can use the approach in the following program. It uses string concatenation and the ord and chr functions.

var s : string := "Caitlin Smith"
var t : string := "caitlin jones"
var newString : string := ""
const difference := ord('a') - ord('A') % used to convert upper to lower

if s < t then
   put s, " comes before ", t
else
   put t, " comes before ", s
end if

% convert s to all lower case
for i : 1 .. length(s)
   % if the current character is an upper case letter convert it
   if s(i) >= 'A' and s(i) <= 'Z' then
      newString := newString + chr(ord(s(i)) + difference)
   else
      newString := newString + s(i)
   end if
end for

s := newString
put "s is now -> ", s

if s < t then
   put s, " comes before ", t
else
   put t, " comes before ", s
end if 

The program produces this output:

Caitlin Smith comes before caitlin jones
s is now -> caitlin smith
caitlin jones comes before caitlin smith


Exercise 6.2

  1. State which of the following comparisons are true.
    1. "Robert" <= "Rob"
    2. "Thomas" > "magnum"
    3. "27" > "134"
    1. "2 * 3" <= "2 + 3"
    2. "#$!*&" < "$@?!%"
    3. "principal" < "principle"
  2. State the output of the following statements. If any will generate errors state why.
    1. put ord('Z')
    2. put ord("#")
    3. put ord (82)
    1. put chr(-10)
    2. put chr(82)
    3. put chr(ord(chr(65)))

  3. Write a program, that reads a string. It reverses the positions of the characters in the string and then prints it. For example, if the string starts with the value "ping pong", it will end having the value "gnop gnip".
  4. Write a program that reads a string and prints it. Convert the string to upper case and print it.