Regular Expressions
Search, validate, and extract text patterns with Python’s re module — one of the most useful tools in any programmer’s kit.
"A regular expression is a mini-language for describing text patterns. Once you speak it, you can find, validate, or extract almost anything from a string in one line."
— ShurAIWhat is a Regular Expression?
A regex (regular expression) is a pattern that describes a set of strings. Python’s re module lets you search, match, and replace text using these patterns:
import re
text = "Call us at 98765-43210 or 91234-56789"
# Find all phone numbers matching the pattern NNNNN-NNNNN
phones = re.findall(r"\d{5}-\d{5}", text)
print(phones)
# ['98765-43210', '91234-56789']
The Pattern Language — Quick Reference
| Pattern | Matches | Example |
|---|---|---|
| \d | Any digit 0-9 | \d\d → "42" |
| \w | Word char (letter, digit, _) | \w+ → "hello_123" |
| \s | Any whitespace (space, tab) | \s+ → spaces |
| . | Any character except newline | h.t → "hat", "hit" |
| ^ | Start of string | ^Hello → must start with Hello |
| $ | End of string | .com$ → must end with .com |
| * | 0 or more of previous | ab* → "a", "ab", "abbb" |
| + | 1 or more of previous | \d+ → "1", "42", "999" |
| ? | 0 or 1 (optional) | colou?r → "color", "colour" |
| {n} | Exactly n times | \d{6} → "560001" |
| [abc] | One of a, b, or c | [aeiou] → any vowel |
The Four Main re Functions
import re
text = "My email is riya@shurai.com and backup is r.backup@mail.in"
# re.search() — find FIRST match anywhere in string
m = re.search(r"\w+@\w+\.\w+", text)
print(m.group()) # riya@shurai.com
# re.findall() — find ALL matches, returns a list
emails = re.findall(r"[\w.]+@[\w.]+\.\w+", text)
print(emails) # ['riya@shurai.com', 'r.backup@mail.in']
# re.match() — match only at the START of the string
m = re.match(r"My", text)
print(bool(m)) # True
# re.sub() — find and REPLACE
hidden = re.sub(r"[\w.]+@[\w.]+\.\w+", "[HIDDEN]", text)
print(hidden)
# My email is [HIDDEN] and backup is [HIDDEN]
Compiled Patterns — Reuse for Speed
If you use the same pattern many times, compile it once:
pin_pattern = re.compile(r"^\d{6}$") # 6-digit PIN
for pin in ["560001", "1234", "ABCDEF", "400001"]:
if pin_pattern.match(pin):
print(f"{pin} — valid PIN")
else:
print(f"{pin} — invalid")
# 560001 — valid PIN
# 1234 — invalid
# ABCDEF — invalid
# 400001 — valid PIN
Real Example — Form Validator
import re
def validate_form(name, email, phone):
errors = []
if not re.match(r"^[A-Za-z ]{2,50}$", name):
errors.append("Name: letters and spaces only, 2-50 chars")
if not re.match(r"^[\w.]+@[\w.]+\.[a-z]{2,}$", email):
errors.append("Email: invalid format")
if not re.match(r"^\+?[0-9]{10,13}$", phone):
errors.append("Phone: 10-13 digits")
return errors if errors else ["All fields valid!"]
print(validate_form("Riya", "riya@shurai.com", "9876543210"))
print(validate_form("R1ya!!!", "not-an-email", "123"))
['All fields valid!']
['Name: letters and spaces only, 2-50 chars', 'Email: invalid format', 'Phone: 10-13 digits']
"Always use raw strings (r"...") for regex patterns in Python. Without the r prefix, backslashes like \d get interpreted as Python escape sequences and your pattern breaks."
— ShurAI🧠 Quiz — Q1
What does \d match in a regex pattern?
🧠 Quiz — Q2
What does re.findall(pattern, text) return?
🧠 Quiz — Q3
Why should regex patterns in Python be written as raw strings (e.g. r"\d+")?
🧠 Quiz — Q4
What does re.sub(pattern, replacement, text) do?