Difference between re.search and re.match in Python

Table of contents

Introduction

Welcome to a new Python code snippets post. Today, we are going to talk about the regular expressions module (i.e RE module). The goal of this post is to clarify the main difference between re.search and re.match operations. Also, we will briefly discuss other operations provided by the RE module. Let us get started…

What is a regular expression?

A regular expression is a text string that describes a search pattern. You are probably familiar with wildcards when listing or searching for files on Unix or Windows. A regular expression is a similar concept but it is more powerful than just wildcards. Regular expressions can be used in search (ex. extracting information from log files) and validation as well. For example, one of the most popular scenarios is email validation on a web form. Before we dive into the topic, we will explain what a Python raw string means as we are going to use it in our examples.

What is Python raw string?

When dealing with regular expression patterns in Python, you may encounter patterns that start with the letter (r) as in the following example:

The pattern above refers to any text that ends with a digit but what does the letter (r) denote for? Well, both Python and regular expression strings use backslash (\) to escape special (i.e. meta) characters. (r) means a raw string in Python. Raw strings do not apply any special treatment to backslashes. This Python feature is convenient when used in regular expressions as it makes the patterns easy to read and less error prone. For example, instead of writing the pattern \\d we just use it as is in a regular expression \d. Very handy ! Let us proceed…

Python RE module

Python RE module offers regular expressions primitive operations for text matching and searching. The RE module provides more than just matching and searching as we will see later. re.match and re.search are one of the most important ones. re.match checks for a match only at the beginning of a string, while re.search checks for a match anywhere in the string.

Python regular expressions syntax summary

Regular expression patterns use control characters to indicate special meaning. If a control character needs to be used as is, we escape it with a backslash. Here is a short list of control characters. You can check the reference section for more details…

  • \d a digit
  • \D a non digit
  • \s a space
  • \S a non space
  • \w letters
  • \W anything but letters
  • . any character except a newline
  • \b any character except for new line
  • + 1 or more
  • ? 0 or 1
  • * 0 or more
  • $ end of string
  • ^ beginning of string
  • | either or
  • [] range
  • {x} this amount of preceding code
  • \n new line
  • \s space
  • \t tab

Regular expressions flags

When searching or matching, regular expression operations can take optional flags or modifiers. The following two modifiers are the most used ones…

  • re.M multiline
  • re.I ignore case

For full list you may check the reference section. Let us now begin with the first regular expression operation re.match…

re.match

re.match matches an expression at the beginning of a string. If a match is found, a match object is returned, otherwise None is returned. If the input is a multiline string (i.e. starts and ends with three double quotes) that does not change the behavior of the match operation. re.match always tries to match the beginning of the string. In regular expressions syntax, the control character (^) is used to match the beginning of a string. If this character is used with re.match, it has no effect. The syntax for re.match operation is as follows…

where

  • pat: regular expression pattern to match
  • str: string in which to search for the pattern
  • flags: one or more modifiers, for example re.M|re.I

re.search

re.search attempts to find the first occurrence of the pattern anywhere in the input string as opposed to the beginning. If the search is successful, re.search returns a match object, otherwise it returns None. The syntax for re.search operation is as follows…

where

  • pat: regular expression pattern to search for
  • str: string in which to search for the pattern
  • flags: one or more modifiers, for example re.M|re.I

Why use re.match?

Now we know the difference between re.match and re.search but the question is: why do we need to use re.match if we can achieve the same result using re.search? There is no specific answer for this question, however re.match is provided as a convenience and explicitly tells the intention of the match operation.

Regular expression compilation

Regular expression compilation produces a Python object that can be used to do all sort of regular expression operations. What is the benefit of that as long as we can use re.match and re.search directly? This technique is convenient in case we want to use a regular expression more than once. It makes our code efficient and more readable. The syntax is as follows…

Here is an example…

If you run the code snippet above, you should get a match in both cases. Now, let us jump into more match and search examples…

re.match and re.search examples

re.fullmatch

re.fullmatch function was added in Python 3.4 to match the entire string. If the pattern matches the input string, a match object is returned otherwise, None is returned. We can easily implement re.fullmatch in terms of re.match however, Python provides this function for convenience. It is useful in validating user input. The intention behind the addition of this function is to be explicit about the goal of the match. Let us take an example…

If you run the code snippet above, the output should look like…

Match objects

If a match is found when using re.match or re.search, we can use some useful methods provided by the match object. Here is a short list of such methods, you may check the reference section for more details…

  • group() returns the part of the string matched by the entire regular expression
  • group(1) returns the text matched by the second capturing group
  • start() and end() return the indices of the start and end of the substring matched by the capturing group

Here is an example…

re.findall

re.findall returns a list of non overlapping matches in a string. The syntax is as follows…

where

  • pat: regular expression pattern to search for
  • str: string in which to search for the pattern
  • flags: one or more modifiers, for example re.M|re.I

Here is an example…

re.finditer

Instead of returning a complete list of all matches (as in re.findall), re.finditer returns an iterator object that allows us to go through all match object instances one by one. The string is scanned left to right, and matches are returned in the order found. Returning a complete list versus an iterator is a separate topic in Python. You can read more about this in the following article.

re.finditer has the following syntax

where

  • pat: regular expression pattern to search for
  • str: string in which to search for the pattern
  • flags: one or more modifiers, for example re.M|re.I

Here is an example…

if you run the code snippet above, the output should look like…

re.split

We can use re.split to split a string into tokens based on a pattern. The syntax looks like…

Where

  • pattern to search for. Used as a delimiter
  • string to split
  • tokens is the split output list

Here is an example..

re.sub

We can use re.sub to search and replace in a string. The syntax looks like…

Where

  • pattern to search for
  • replace with repl
  • in string
  • all occurrences except when max is used

Here is an example..

Summary

  • A regular expression is a text string that describes a search pattern
  • Raw strings in Python do not apply any special treatment to backslashes
  • Python RE module offers regular expression primitive operations for text matching and searching
  • re.match matches an expression at the beginning of a string
  • re.search attempts to find the first occurrence of the pattern anywhere in the input string
  • Regular expression compilation produces a Python object that can be used to do all sort of regular expression operations
  • re.fullmatch function was added in Python 3.4 to match the entire string
  • re.findall returns a list of non overlapping matches in a string
  • re.finditer returns an iterator object that allows us to go through all match object instances
  • re.split is used to split a string into tokens based on a pattern
  • re.sub is used to search and replace in a string

References

Thanks for reading. Please use the comments section for feedback.

Tags:

Add a Comment

Your email address will not be published. Required fields are marked *