Regular Expressions in Python
Regular expressions (regex) are a powerful mini-language for describing text patterns. Python's built-in re module gives you everything you need to search, extract, replace, and split strings.
Raw Strings for Patterns
Always write regex patterns as raw strings with the r prefix:
The r prefix tells Python not to process backslash escape sequences, so r"\d" is literally backslash-d โ exactly what the regex engine expects.
re.match vs re.search
These two functions look similar but behave very differently:
re.match(pattern, string)โ only matches at the start of the string (anchored)re.search(pattern, string)โ scans the entire string, returns the first match anywhere
Both return None when there is no match โ always check before calling .group().
re.findall โ All Matches as a List
re.findall returns all non-overlapping matches as a list of strings:
When the pattern contains a capturing group, findall returns a list of the captured text instead of full matches. With multiple groups it returns a list of tuples.
re.finditer โ Iterator of Match Objects
re.finditer is like findall but returns an iterator of Match objects, giving you position information too:
Use finditer instead of findall whenever you need match positions or other Match object attributes.
re.sub โ Replace Matches
re.sub(pattern, replacement, string) replaces every match with replacement:
You can also pass a callable as the replacement โ it receives the Match object and returns the replacement string.
re.split โ Split on a Pattern
re.split splits a string on every occurrence of the pattern:
re.compile โ Reuse Patterns
Compiling a pattern once and reusing it is faster when the same pattern is used many times:
A compiled pattern has the same methods as the re module (.match, .search, .findall, etc.).
Capturing Groups
Parentheses () create capturing groups that you can extract individually:
Named Groups โ (?P<name>...)
Named groups make patterns self-documenting and let you access captures by name:
Non-Capturing Groups โ (?:...)
Use (?:...) when you need grouping for alternation or repetition but don't need to capture:
Flags
Flags modify how the pattern engine behaves:
| Flag | Short | Effect |
|---|---|---|
re.IGNORECASE | re.I | Case-insensitive matching |
re.MULTILINE | re.M | ^ and $ match start/end of each line |
re.DOTALL | re.S | . matches newlines too |
re.VERBOSE | re.X | Allow whitespace and comments in pattern |
Common Patterns
Greedy vs Non-Greedy
By default, quantifiers (*, +, ?, {n,m}) are greedy โ they match as much as possible. Add ? to make them non-greedy (lazy):
Non-greedy quantifiers are essential when parsing structured text like HTML, XML, or log lines where you need the shortest possible match.
Knowledge Check
What is the key difference between re.match() and re.search()?
What does the pattern r'(?P<year>\d{4})' create in a regex?
Given text = 'aaa bbb aaa', what does re.findall(r'a+', text) return?