Python Performance Tips

Measure bottlenecks with timeit and cProfile, then apply targeted fixes: sets, generators, join(), lru_cache, and local variable tricks.

"The #1 rule of optimisation: don't. The #2 rule: don't yet. Measure first, find the real bottleneck, then fix only that. Premature optimisation produces complex code that solves the wrong problem."

— Shurai

The Performance Workflow

1. Make it
correct

→

2. Make it
readable

→

3. Measure
(profiling)

→

4. Fix the
bottleneck

Most code never needs step 4. Step 3 almost always reveals the bottleneck is not where you expected.

Measuring — timeit and cProfile

python — timeit: compare two approaches

import timeit

# Run each 10,000 times to get a stable measurement
t_comp = timeit.timeit(
    "[x**2 for x in range(1000)]",    # list comprehension
    number=10_000
)
t_loop = timeit.timeit(
    "r=[]
for x in range(1000): r.append(x**2)",  # manual loop
    number=10_000
)

print(f"List comprehension : {t_comp:.3f}s")  # 0.38s
print(f"Manual loop+append : {t_loop:.3f}s")  # 0.56s
print(f"Comprehension is {t_loop/t_comp:.1f}x faster")

terminal — cProfile: find which function is the real bottleneck

python -m cProfile -s cumulative my_script.py

# Output shows:
#   ncalls  tottime  cumtime  filename:lineno(function)
#     1000    2.412    3.001  my_script.py:15(slow_func)
#      500    0.003    0.003  my_script.py:8(fast_func)

# Fix slow_func — it takes 3s. Ignore fast_func — it takes 0.003s.

Tip 1 — Use Sets for Membership Tests

List lookup scans every element: O(n). Set lookup uses a hash: O(1) regardless of size:

python

# Slow: checks every element in the list one by one
valid_roles_list = ["admin", "editor", "viewer", "moderator"]
if "admin" in valid_roles_list:    # O(n) — gets slower as list grows
    pass

# Fast: hash lookup — always instant no matter how many items
valid_roles_set = {"admin", "editor", "viewer", "moderator"}
if "admin" in valid_roles_set:     # O(1) — always the same speed
    pass

Tip 2 — join() Instead of String Concatenation

String += in a loop creates a brand new string object every iteration — that’s O(n²). join() does one allocation:

python

words = ["Python", "is", "fast", "when", "used", "correctly"]

# BAD — O(n²): creates a new string on every +=
result = ""
for w in words:
    result += w + " "

# GOOD — O(n): single memory allocation for the whole result
result = " ".join(words)

Tip 3 — Generators Over Lists for Large Data

python

# List comprehension: creates 1 million integers in memory at once
total_list = sum([x**2 for x in range(1_000_000)])

# Generator expression: calculates one value at a time — barely uses RAM
total_gen = sum(x**2 for x in range(1_000_000))  # note: no []

# Same result. Generator uses ~100x less memory for large sequences.

Tip 4 — Cache Expensive Repeated Calls with lru_cache

python

from functools import lru_cache
import time

@lru_cache(maxsize=128)
def slow_fetch(user_id):
    """Simulates a slow database lookup."""
    time.sleep(0.5)           # pretend this takes 500ms
    return {"id": user_id, "name": f"User {user_id}"}

slow_fetch(42)   # 500ms — real call
slow_fetch(42)   # 0ms  — returned from cache
slow_fetch(42)   # 0ms  — still from cache
slow_fetch(99)   # 500ms — different argument, real call

print(slow_fetch.cache_info())
# CacheInfo(hits=2, misses=2, maxsize=128, currsize=2)

Tip 5 — Avoid Repeated Attribute Lookups in Tight Loops

python

import math

# Slow: Python looks up math.sqrt on every single iteration
for i in range(1_000_000):
    math.sqrt(i)

# Fast: bind to a local variable once — local lookup is much cheaper
sqrt = math.sqrt
for i in range(1_000_000):
    sqrt(i)

⚡ Quick-reference: which tip applies when?

Membership tests

Change list → set

Building strings

Use "".join() not +=

Large sequences

Use generators, not lists

Repeated lookups

Use lru_cache

"Profile before you optimise. Every experienced engineer has a story about spending days speeding up code that ran 0.001% of total time. cProfile shows you the truth in 5 seconds."

— Shurai

🧠 Quiz — Q1

What is the correct order for writing performant Python code?

🧠 Quiz — Q2

Why is testing membership with a set faster than with a list?

🧠 Quiz — Q3

Why is " ".join(words) faster than concatenating with += in a loop?

🧠 Quiz — Q4

What does @lru_cache do?