You are currently viewing Regular expression in Python

Regular expression in Python

A regular expression, sometimes known as “regex,” is an effective tool for finding patterns in text. A regex can be used to match a single character or a predetermined group of characters at its most basic level, but it can also be used to match more sophisticated patterns that span numerous lines or contain optional or recurring elements.

Functions in Regular expression in Python

The re module in Python supports regular expressions. The Python re module includes the following typical functions for using regular expressions:

re.compileCreate a regular expression object by compiling a regular expression pattern. The object’s match(), search(), and other methods, which are detailed below, can then be used to match data.
re.searchWhen there is a match anyplace within the string, returns a Match object.
re.matchMatch at the start of the string only.
re.fullmatchReturn a related match object if the entire string satisfies the regular expression pattern.
re.splitSplit string according to pattern occurrences.
re.findallProvides a list of all matches as a list or tuple.
re.finditerReturn an iterator that produces matching objects for all of the RE pattern’s non-overlapping matches in the string.
re.subReplaces a string with one or more matches.
re.subnSame as a sub but returns as a tuple.
re.escapeIn patterns, special characters are escaped.
re.purgeCache memory is cleared

Metacharacters

Metacharacters are unique characters with specific meanings that are used in regular expressions. They are applied to strings to match particular character patterns. The following list of typical metacharacters for regular expressions:

CharacterDescription
[]A collection of characters
\Can be considered special characters
^Any entity
$String terminates with
*None or more instances
+One or more instances
?None or one instance
{}The exact number of instances that were requested
|Either or
()Acquire and assemble

Special Sequences

Regular expressions use special sequences, a subclass of metacharacter, to match the particular character or pattern kinds. In regular expressions, the following common special sequences are used:

CharacterDescription
\AIf the provided characters appear at the start of the string, it returns a match.
\bProvides a match if the requested characters appear at the start or end of a term.
\BProvides a match if the requested characters do not appear at the start or end of a term.
\dThe match is found if the string contains the digit.
\DThe match is found where the string doesn’t contain the digit.
\sThe match is found when the string contains whitespace characters.
\SThe match is found when the string doesn’t contain a whitespace character.
\wMatch when the string contains any word character it could be (a-z, A-z,0-9,_)
\WMatch when the string contains doesn’t contain any word character for instance it should not be (a-z, A-z,0-9,_)
\ZMatch found if provided characters are at end of the string

You can use these unique sequences to create more exact regular expressions because they are highly effective at matching particular characters or patterns in a string.

Finding the digits from a given string in python

In Python, regular expressions can be used to find every digit in a given text. Here is an illustration of how to use regular expressions in Python to extract every digit from a given string:

import re

string = ‘I was born in the year 1999 at 2:06pm on 16th of June.’
pattern = ‘\d+’ #this detects the digits in the string

result = re.findall(pattern, string) 
print(result)


OUTPUT:
[‘1999’, ‘2’, ’06’, ’16’]

Searching for a keyword in a string

The search function is used for finding a keyword inside a string. It output us the index position of the searched word.

import re
 
txt = “The summer is hot in India”
x = re.search(“^The.*India$”, txt)
print(x)


OUTPUT:
<re.Match object; span=(0, 26), match=’The summer is hot in India’>

Email validation in python

Using regular expressions, there are a few different approaches to validate an email address in Python. One method is to compare the email address to a pattern that specifies the format of a valid email address using the re-module. Here is an illustration of how to validate an email address in Python using regular expressions:

import re

emails = ”’
abc@mail.com
xyz@companymail.com
a1b2c3@collegemail.edu.in
”’

pattern = re.compile(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+’)


matches = pattern.finditer(emails)


for match in matches:
    print(match)
Email validation in Python

Phone number validation in Python

In Python, you may check a phone number against a pattern that specifies the format of a legitimate phone number to see if it is valid by using regular expressions. An illustration of how to validate a phone number in Python using regular expressions is given below:

import re

def is_valid_phone_number(phone_number):
    pattern = re.compile(r’^\(\d{3}\) \d{3}-\d{4}$’)
    match = pattern.search(phone_number)
    if match:
        return True
    return False

phone_number = “(123) 456-7890”
print(is_valid_phone_number(phone_number)) # True

phone_number = “123-456-7890”
print(is_valid_phone_number(phone_number))

OUTPUT:
True
False

The above program is only applicable to United states phone numbers only. If you want to make any changes you can change the pattern and phone number variables. For example- to verify the Indian phone number using regular expression is shown down below

def is_valid_indian_phone_number(phone_number):
    pattern = re.compile(r’^\d{5}\d{5}$’)
    match = pattern.search(phone_number)
    if match:
        return True
    return False

phone_number = “9839957123”
print(is_valid_indian_phone_number(phone_number))

OUTPUT:
True

Applications of regular expression

  • Regular expressions can be used in text processing to look for certain text patterns, and replace them with new text  For instance, you could use a regular expression to extract the URLs from a collection of web pages or to search for all occurrences of a phone number or email address in a document.
  • Regular expressions can be used to check user input, including passwords, phone numbers, and email addresses. This guarantees that the input follows a particular format and can be properly processed.
  • Regular expressions are an effective method for deciphering log files and extracting important data. They can be used to extract crucial information, look for specific text patterns, and summarise the log data.
  • To match and filter IP addresses, hostnames, and other network-related data, regular expressions can be employed in networking.
  • Regular expressions can be used to match and filter data in databases. For instance, they can be used to look for specific patterns in a field or to locate all the entries that satisfy given criteria.

Pros and cons

Pros:

  • With just a few lines of code, regular expressions can be used to match intricate patterns of characters in a string. They are succinct and expressive.
  • Regular expressions are a commonly utilized and adaptable skill since they can be employed in a variety of programming languages and tools.
  • When checking email addresses and phone numbers, looking for patterns in log files, or extracting data from web pages, regular expressions can be utilized for a variety of tasks.

Cons

  • When used to match vast volumes of data or when the pattern is very complicated, regular expressions can be slow to run.
  • Due to the potential for unexpected matches or non-matches caused by even minor mistakes in the pattern, regular expressions can be challenging to debug and manage.
  • If written carelessly, regular expressions are susceptible to ReDoS (Regular Expression Denial of Service) attacks.
  • Nested structures, such as matching nested brackets or balanced delimiters, etc., cannot be handled by regular expressions.

Conclusion

In this tutorial, we learn about a variety of regular expression commands, their definitions, and examples of usage. If necessary, you can incorporate it into your projects, especially when working with huge text databases whose contents you are unsure of. It is practical and quite simple to use and explore. Practice alone will be the sole effective learning method.