๐chain, product, combinations, permutations, groupbyLESSON~15 min
Python itertools
The itertools module provides memory-efficient building blocks for working with iterators. These functions produce results lazily โ they generate values on demand rather than building entire lists in memory. This matters enormously when working with large datasets.
Why Itertools?
Instead of building a list like [(x, y) for x in range(1000) for y in range(1000)] (1,000,000 tuples in memory), you can iterate over itertools.product(range(1000), range(1000)) and produce one pair at a time.
itertools.chain
Chains multiple iterables together into a single stream:
itertools.product
Cartesian product โ every combination of elements from multiple iterables:
itertools.combinations
All combinations of r elements from an iterable (order doesn't matter, no repetition):
itertools.permutations
All orderings of r elements (order matters):
itertools.groupby
Groups consecutive elements by a key function. Important: the input must be sorted by the same key, or you'll get multiple groups for the same key value:
itertools.islice
Slices an iterator (like s[start:stop:step] but for any iterable):
itertools.cycle
Cycles through an iterable indefinitely:
itertools.accumulate
Produces running totals (or any running aggregate):
itertools.takewhile and itertools.dropwhile
Stop or skip elements based on a predicate:
Combining Itertools: A Data Pipeline
Knowledge Check
What is the key requirement for `itertools.groupby` to produce correct groups?
What is the difference between `itertools.combinations(['a','b','c'], 2)` and `itertools.permutations(['a','b','c'], 2)`?
Why is `itertools.islice(my_generator, 5)` preferred over converting the generator to a list and slicing?