Python Performance Tips
Measure bottlenecks with timeit and cProfile, then apply targeted fixes: sets, generators, join(), lru_cache, and local variable tricks.
"The #1 rule of optimisation: don't. The #2 rule: don't yet. Measure first, find the real bottleneck, then fix only that. Premature optimisation produces complex code that solves the wrong problem."
— ShuraiThe Performance Workflow
correct
readable
(profiling)
bottleneck
Measuring — timeit and cProfile
import timeit
# Run each 10,000 times to get a stable measurement
t_comp = timeit.timeit(
"[x**2 for x in range(1000)]", # list comprehension
number=10_000
)
t_loop = timeit.timeit(
"r=[]
for x in range(1000): r.append(x**2)", # manual loop
number=10_000
)
print(f"List comprehension : {t_comp:.3f}s") # 0.38s
print(f"Manual loop+append : {t_loop:.3f}s") # 0.56s
print(f"Comprehension is {t_loop/t_comp:.1f}x faster")
python -m cProfile -s cumulative my_script.py
# Output shows:
# ncalls tottime cumtime filename:lineno(function)
# 1000 2.412 3.001 my_script.py:15(slow_func)
# 500 0.003 0.003 my_script.py:8(fast_func)
# Fix slow_func — it takes 3s. Ignore fast_func — it takes 0.003s.
Tip 1 — Use Sets for Membership Tests
List lookup scans every element: O(n). Set lookup uses a hash: O(1) regardless of size:
# Slow: checks every element in the list one by one
valid_roles_list = ["admin", "editor", "viewer", "moderator"]
if "admin" in valid_roles_list: # O(n) — gets slower as list grows
pass
# Fast: hash lookup — always instant no matter how many items
valid_roles_set = {"admin", "editor", "viewer", "moderator"}
if "admin" in valid_roles_set: # O(1) — always the same speed
pass
Tip 2 — join() Instead of String Concatenation
String += in a loop creates a brand new string object every iteration — that’s O(n²). join() does one allocation:
words = ["Python", "is", "fast", "when", "used", "correctly"]
# BAD — O(n²): creates a new string on every +=
result = ""
for w in words:
result += w + " "
# GOOD — O(n): single memory allocation for the whole result
result = " ".join(words)
Tip 3 — Generators Over Lists for Large Data
# List comprehension: creates 1 million integers in memory at once
total_list = sum([x**2 for x in range(1_000_000)])
# Generator expression: calculates one value at a time — barely uses RAM
total_gen = sum(x**2 for x in range(1_000_000)) # note: no []
# Same result. Generator uses ~100x less memory for large sequences.
Tip 4 — Cache Expensive Repeated Calls with lru_cache
from functools import lru_cache
import time
@lru_cache(maxsize=128)
def slow_fetch(user_id):
"""Simulates a slow database lookup."""
time.sleep(0.5) # pretend this takes 500ms
return {"id": user_id, "name": f"User {user_id}"}
slow_fetch(42) # 500ms — real call
slow_fetch(42) # 0ms — returned from cache
slow_fetch(42) # 0ms — still from cache
slow_fetch(99) # 500ms — different argument, real call
print(slow_fetch.cache_info())
# CacheInfo(hits=2, misses=2, maxsize=128, currsize=2)
Tip 5 — Avoid Repeated Attribute Lookups in Tight Loops
import math
# Slow: Python looks up math.sqrt on every single iteration
for i in range(1_000_000):
math.sqrt(i)
# Fast: bind to a local variable once — local lookup is much cheaper
sqrt = math.sqrt
for i in range(1_000_000):
sqrt(i)
"Profile before you optimise. Every experienced engineer has a story about spending days speeding up code that ran 0.001% of total time. cProfile shows you the truth in 5 seconds."
— Shurai🧠 Quiz — Q1
What is the correct order for writing performant Python code?
🧠 Quiz — Q2
Why is testing membership with a set faster than with a list?
🧠 Quiz — Q3
Why is " ".join(words) faster than concatenating with += in a loop?
🧠 Quiz — Q4
What does @lru_cache do?