Python

A Quick Guide to Regex in Python

A Quick Guide to Regex in Python

Introduction

In this article, we will discover regular expressions (RegEx) and work with RegEx using Python's re-module (with the help of examples). A string of characters known as a regular expression (RegEx) characterizes a search pattern. In order to work with regular expressions, Python has a module called re. We must import the module in order to use it.  To work with RegEx, the module defines a number of functions and constants which we will see one by one with the associated code.

The table below highlights all the important regex rulesets.

re.findall()

The list of strings that the re.findall() method returns contains all matches. The empty list is returned by re.findall() if the pattern is not found.

Code to extract numbers from a string:

import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'

result = re.findall(pattern, string)
print(result)


# Output: ['12', '89', '34']

write your code here: Coding Playground

re.split()

When there is a match, the re.split function separates the matching string and returns a list of the split strings. Re.split() provides a list representing the original text if the pattern was not detected. The re.split() function accepts the maxsplit parameter. It represents the highest number of splits possible. The maximum split is set by default to 0 and includes all splits.

Example 1:

import re

string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'

result = re.split(pattern, string)
print(result)


# Output: ['Twelve:', ' Eighty nine:', '.']

write your code here: Coding Playground

Example 2

import re

string = 'Twelve:12 Eighty nine:89 Nine:9.'
pattern = '\d+'

# maxsplit = 1
# split only at the first occurrence
result = re.split(pattern, string, 1)
print(result)


# Output: ['Twelve:', ' Eighty nine:89 Nine:9.']

write your code here: Coding Playground

re.sub()

Syntax -> re.sub(pattern, replace, string)

The method delivers a string with the replace variable's contents substituted for all instances that match. The original string is returned by re.sub() if the match is not detected. The re.sub() function accepts count as a fourth argument. If left out, it equals 0. This will take the place of all instances.

Code to remove all whitespaces

import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = ''

new_string = re.sub(pattern, replace, string)
print(new_string)


# Output: abc12de23f456

write your code here: Coding Playground

Example 2

import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'
replace = ''

new_string = re.sub(r'\s+', replace, string, 1)
print(new_string)

# Output:
# abc12de 23
# f45 6

write your code here: Coding Playground

re.subn()

The difference between re.subn() and re.sub() is that the latter provides a tuple of two items that includes the new string and the number of replacements that were made.

Code  to remove all whitespaces

import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = ''

new_string = re.subn(pattern, replace, string)
print(new_string)


# Output: ('abc12de23f456', 4)

write your code here: Coding Playground

re.search()

A pattern and a string are the two inputs required by the re.search() function. The first place where the RegEx pattern and the string match is where the method looks. Re.search() provides a match object if the query is successful and None if it is unsuccessful.

Syntax -> match = re.search(pattern, str)

Example

import re

string = "Python is fun"

match = re.search('\APython', string)

if match:
  print("pattern found inside the string")
else:
  print("pattern not found"

# Output: pattern found inside the string

write your code here: Coding Playground

Using Regular Character Prefixes in RegEx

Before a regular expression, the letters r or R stand for raw string. For instance, "n" stands for a new line, whereas "r" and "n" stand for a backslash and an n, respectively. All metacharacters, including other characters, can be escaped with the backlash symbol. Nevertheless, the r prefix causes it to be treated as a regular character.

Example

import re

string = '\n and \r are escape sequences.'

result = re.findall(r'[\n\r]', string)
print(result)

# Output: ['\n', '\r']

write your code here: Coding Playground