๐Ÿ”re module: match, search, findall, groups, flagsLESSON

Regular Expressions in Python

Regular expressions (regex) are a powerful mini-language for describing text patterns. Python's built-in re module gives you everything you need to search, extract, replace, and split strings.

Raw Strings for Patterns

Always write regex patterns as raw strings with the r prefix:

The r prefix tells Python not to process backslash escape sequences, so r"\d" is literally backslash-d โ€” exactly what the regex engine expects.

re.match vs re.search

These two functions look similar but behave very differently:

  • re.match(pattern, string) โ€” only matches at the start of the string (anchored)
  • re.search(pattern, string) โ€” scans the entire string, returns the first match anywhere

Both return None when there is no match โ€” always check before calling .group().

re.findall โ€” All Matches as a List

re.findall returns all non-overlapping matches as a list of strings:

When the pattern contains a capturing group, findall returns a list of the captured text instead of full matches. With multiple groups it returns a list of tuples.

re.finditer โ€” Iterator of Match Objects

re.finditer is like findall but returns an iterator of Match objects, giving you position information too:

Use finditer instead of findall whenever you need match positions or other Match object attributes.

re.sub โ€” Replace Matches

re.sub(pattern, replacement, string) replaces every match with replacement:

You can also pass a callable as the replacement โ€” it receives the Match object and returns the replacement string.

re.split โ€” Split on a Pattern

re.split splits a string on every occurrence of the pattern:

re.compile โ€” Reuse Patterns

Compiling a pattern once and reusing it is faster when the same pattern is used many times:

A compiled pattern has the same methods as the re module (.match, .search, .findall, etc.).

Capturing Groups

Parentheses () create capturing groups that you can extract individually:

Named Groups โ€” (?P<name>...)

Named groups make patterns self-documenting and let you access captures by name:

Non-Capturing Groups โ€” (?:...)

Use (?:...) when you need grouping for alternation or repetition but don't need to capture:

Flags

Flags modify how the pattern engine behaves:

FlagShortEffect
re.IGNORECASEre.ICase-insensitive matching
re.MULTILINEre.M^ and $ match start/end of each line
re.DOTALLre.S. matches newlines too
re.VERBOSEre.XAllow whitespace and comments in pattern

Common Patterns

Greedy vs Non-Greedy

By default, quantifiers (*, +, ?, {n,m}) are greedy โ€” they match as much as possible. Add ? to make them non-greedy (lazy):

Non-greedy quantifiers are essential when parsing structured text like HTML, XML, or log lines where you need the shortest possible match.

Knowledge Check

What is the key difference between re.match() and re.search()?

What does the pattern r'(?P<year>\d{4})' create in a regex?

Given text = 'aaa bbb aaa', what does re.findall(r'a+', text) return?