Python compare strings

Introduction

It is hard (even impossible) to find a real world application not using string comparison. It can be used in database lookup, searching for files on disk, sorting contacts and many other scenarios. String comparison in Python is not hard as we will demonstrate in today’s code snippets. Let us get started…

Python string identity vs string equality

Like any other data, a string occupies a memory location, so we need to differentiate between its content and address. A string variable in Python has a value and memory address or a pointer called ID. We can use both (value or id) to compare strings. In other words, two strings can be equal but not exactly the same, meaning stored in two different memory locations. Let us clarify that…

If you run the code snippet above, you should get something like…

As you can see, x and y both have the same id. That means both point to the same memory location. In other words, they are both equal in value and they refer to the same object. On the other hand, x and z are equal but they are not identical. They point to two different memory locations. In short…

  • Use (is) for identity testing
  • Use == for equality testing

Python string comparison operators

Python provides various operators to perform string comparison. Let us review some of those…

  • Equal ==
  • Not equal != (you can also use <>)
  • Greater than >
  • Less than <
  • Greater than or equal >=
  • Less than or equal <=

But wait a minute, how come we compare strings as if they are numbers ? The answer is that Python compares string lexicographically (i.e using ASCII value). Take a look at the following examples…

In Python, if you want to get the ASCII code of a given character, you can use the function ord(character) or chr(code) to get the character for a given code. Here is an example…

Modifying a string

If a string is modified, its object id changes as well. Take a look at the following example…

If you run the code snippet above, you should get two different string ids.

Let us now talk about unicode strings…

Strings are cool until we deal with unicode strings, at that point we need to be careful. Please note that Unicode is beyond the scope of this article and I advise the reader to refer to the Python reference for more details. For the sake of this article, we are going to include the bare minimum to get you started…

Python unicode string comparison

Let us define few terms…

  • Encoding is converting a sequence of characters into a special format for efficient processing, storage and transmission. Decoding is the opposite process. It is the conversion of encoded data back to the original form
  • There are various ways to convert text into a byte stream. Such conversion in a particular way is called an encoding scheme (ex. ASCII, UTF-8)
  • Unicode defines a unique integer number (called code point) for every character regardless of platform, device, application or language. Unicode has its own encoding schemes. The most popular one is UTF8
  • There is a big difference in handling unicode strings between Python 2.x and 3.x. A string in Python 2.x can be of type str (bytes or ASCII) or unicode. Here is an example…

If that is the case, then how can we compare strings in Python 2.x? Take a look at the following example…

In Python 2.x, try not to compare strings of type str with strings of type unicode. Make sure you are comparing strings of the same type.

  • On the other hand, a string in Python 3.x is unicode by default. Let us take an example…

Python multiline string comparison

Try to compare directly, it should work exactly the same as regular strings. If for some reason, it doesn’t work then you may need to use a loop with a split on new lines. Here is an example…

Time to summarize…

Summary

  • String comparison is a commonly used Python language feature. It can be used in database lookup, searching for files on disk, sorting contacts, etc.
  • String can be equal in value but stored in different memory locations. To test for equality use == and to test if they are identical use the (is) operator
  • Be extra careful when comparing unicode strings. Python 2.x and Python 3.x are different in handling unicode. For example, in Python 2.x it is not a good practice to compare a string of type str with a string of type unicode

That is it for today, thanks for visiting. Please leave a comment if you have a question

Tags:

Add a Comment

Your email address will not be published. Required fields are marked *