Course Progress65%
🍎 Python Advanced Python Topic 65 / 100
⏳ 8 min read

Regular Expressions

Search, validate, and extract text patterns with Python’s re module — one of the most useful tools in any programmer’s kit.

"A regular expression is a mini-language for describing text patterns. Once you speak it, you can find, validate, or extract almost anything from a string in one line."

— ShurAI

What is a Regular Expression?

A regex (regular expression) is a pattern that describes a set of strings. Python’s re module lets you search, match, and replace text using these patterns:

python
import re

text = "Call us at 98765-43210 or 91234-56789"

# Find all phone numbers matching the pattern NNNNN-NNNNN
phones = re.findall(r"\d{5}-\d{5}", text)
print(phones)
# ['98765-43210', '91234-56789']

The Pattern Language — Quick Reference

Pattern Matches Example
\dAny digit 0-9\d\d → "42"
\wWord char (letter, digit, _)\w+ → "hello_123"
\sAny whitespace (space, tab)\s+ → spaces
.Any character except newlineh.t → "hat", "hit"
^Start of string^Hello → must start with Hello
$End of string.com$ → must end with .com
*0 or more of previousab* → "a", "ab", "abbb"
+1 or more of previous\d+ → "1", "42", "999"
?0 or 1 (optional)colou?r → "color", "colour"
{n}Exactly n times\d{6} → "560001"
[abc]One of a, b, or c[aeiou] → any vowel

The Four Main re Functions

python
import re
text = "My email is riya@shurai.com and backup is r.backup@mail.in"

# re.search() — find FIRST match anywhere in string
m = re.search(r"\w+@\w+\.\w+", text)
print(m.group())     # riya@shurai.com

# re.findall() — find ALL matches, returns a list
emails = re.findall(r"[\w.]+@[\w.]+\.\w+", text)
print(emails)        # ['riya@shurai.com', 'r.backup@mail.in']

# re.match() — match only at the START of the string
m = re.match(r"My", text)
print(bool(m))       # True

# re.sub() — find and REPLACE
hidden = re.sub(r"[\w.]+@[\w.]+\.\w+", "[HIDDEN]", text)
print(hidden)
# My email is [HIDDEN] and backup is [HIDDEN]

Compiled Patterns — Reuse for Speed

If you use the same pattern many times, compile it once:

python
pin_pattern = re.compile(r"^\d{6}$")   # 6-digit PIN

for pin in ["560001", "1234", "ABCDEF", "400001"]:
    if pin_pattern.match(pin):
        print(f"{pin} — valid PIN")
    else:
        print(f"{pin} — invalid")

# 560001 — valid PIN
# 1234   — invalid
# ABCDEF — invalid
# 400001 — valid PIN

Real Example — Form Validator

python
import re

def validate_form(name, email, phone):
    errors = []

    if not re.match(r"^[A-Za-z ]{2,50}$", name):
        errors.append("Name: letters and spaces only, 2-50 chars")

    if not re.match(r"^[\w.]+@[\w.]+\.[a-z]{2,}$", email):
        errors.append("Email: invalid format")

    if not re.match(r"^\+?[0-9]{10,13}$", phone):
        errors.append("Phone: 10-13 digits")

    return errors if errors else ["All fields valid!"]

print(validate_form("Riya",     "riya@shurai.com",    "9876543210"))
print(validate_form("R1ya!!!", "not-an-email",       "123"))
output
['All fields valid!']
['Name: letters and spaces only, 2-50 chars', 'Email: invalid format', 'Phone: 10-13 digits']

"Always use raw strings (r"...") for regex patterns in Python. Without the r prefix, backslashes like \d get interpreted as Python escape sequences and your pattern breaks."

— ShurAI

🧠 Quiz — Q1

What does \d match in a regex pattern?

🧠 Quiz — Q2

What does re.findall(pattern, text) return?

🧠 Quiz — Q3

Why should regex patterns in Python be written as raw strings (e.g. r"\d+")?

🧠 Quiz — Q4

What does re.sub(pattern, replacement, text) do?