โฑ๏ธcProfile, timeit, line_profiler conceptsLESSON

Profiling and Performance in Python

Before optimizing code, measure. Guessing at bottlenecks wastes time and often makes code worse. Python provides several tools for measuring where time actually goes.


timeit โ€” Micro-Benchmarking

timeit runs a small snippet many times and reports wall-clock time. It disables garbage collection by default so GC pauses don't skew results.

The setup argument runs once before the timed loop โ€” use it to import modules or create data that the snippet needs without including that time in the measurement:

timeit.repeat()

repeat(stmt, repeat=5, number=1000) runs number iterations repeat times and returns a list of totals. Use min() of the list โ€” the minimum is the most stable estimate of "best case" (lower is fewer OS interruptions):

timeit from the Command Line


cProfile โ€” Full Program Profiling

cProfile is a deterministic profiler โ€” it instruments every function call. Use it to find which functions consume the most time in a real program.

Sample output:

Columns explained:

ColumnMeaning
ncallsTotal calls (recursive shown as 2692537/1 = total/primitive)
tottimeTime spent inside this function, excluding subcalls
percalltottime รท ncalls
cumtimeTotal time including all subcalls
percall (2nd)cumtime รท primitive calls

Look at tottime to find functions that are slow in themselves; look at cumtime to find the most expensive call trees.


pstats โ€” Sorting Profile Output

pstats.Stats lets you load, sort, and filter profile data programmatically:

Sort keys: "tottime", "cumtime", "ncalls", "filename", "name".


memory_profiler โ€” Concept (Not in Stdlib)

memory_profiler is a third-party package (pip install memory-profiler) that reports per-line memory usage. It uses the @profile decorator (which is injected by the runner, not imported):

For a quick size estimate without installing anything, use sys.getsizeof:


Practical Python Performance Tips

1. Use built-in functions over manual loops

Built-ins like sum(), map(), filter(), min(), max() are implemented in C and run faster than equivalent Python for-loops:

2. Avoid global variable lookups โ€” cache as local

Python looks up global names in the module's __dict__ on every access. Inside a tight loop, binding a global to a local variable eliminates that overhead:

3. List comprehensions vs for-loops

List comprehensions are compiled to a faster bytecode path than equivalent for-loops with .append():

4. Avoid repeated attribute lookup

Each obj.method expression traverses the attribute chain. Cache it:

5. slots for memory-dense objects

By default, Python instances store attributes in a __dict__. Declaring __slots__ replaces that with a fixed-size array, cutting memory by ~40-50% for many small objects:

6. lru_cache for expensive pure functions

functools.lru_cache memoizes calls so repeated inputs return cached results without re-computing:

7. NumPy for numeric work

For array arithmetic, NumPy's vectorized operations run in C and are 10-100ร— faster than Python loops:


Big-O of Common Operations

OperationComplexityNotes
list.append()O(1) amortizedOccasional resize is O(n)
list.insert(i, x)O(n)Shifts all elements after index
list[i]O(1)Index access
list.pop()O(1)Pop from end
list.pop(0)O(n)Use collections.deque instead
x in listO(n)Linear scan
x in setO(1) avgHash lookup
dict[key]O(1) avgHash lookup
sorted(n)O(n log n)Timsort
bisect.bisectO(log n)Binary search
heapq.heappushO(log n)Heap sift
heapq.heapifyO(n)Build heap from list

Knowledge Check

In cProfile output, `tottime` represents:

Why does `timeit` disable the garbage collector by default during measurement?

What is the time complexity of `x in some_set` for a Python set?