Threading

Run multiple threads to do I/O-bound tasks simultaneously — and understand the GIL, thread safety, and when not to use threads.

"Threads are like workers sharing the same desk. They can work at the same time but must take turns using the same tools — which in Python means taking turns with the GIL."

— Shurai

What is a Thread?

A thread is a separate line of execution inside your program. All threads share the same memory and variables. Your Python program always has at least one thread (the main thread):

Without threads vs with threads:

One thread (default)

Tasks run one after another.
Download A, then B, then C.
Total time = sum of all.

Multiple threads

Tasks run side by side.
Download A, B, C together.
Total time ≈ longest one.

Creating and Starting Threads

python

import threading
import time

def download(file_name, duration):
    print(f"Starting download: {file_name}")
    time.sleep(duration)             # simulate network delay
    print(f"Finished: {file_name}")

# Create threads — target is the function, args is a tuple
t1 = threading.Thread(target=download, args=("video.mp4",  3))
t2 = threading.Thread(target=download, args=("music.mp3",  2))
t3 = threading.Thread(target=download, args=("photo.jpg",  1))

t1.start()   # launch each thread
t2.start()
t3.start()

t1.join()    # wait for each to finish
t2.join()
t3.join()

print("All downloads complete!")

output (finishes in ~3s, not 6s)

Starting download: video.mp4
Starting download: music.mp3
Starting download: photo.jpg
Finished: photo.jpg   ← 1s
Finished: music.mp3   ← 2s
Finished: video.mp4   ← 3s
All downloads complete!

The GIL — Python's Important Limitation

Python has a Global Interpreter Lock (GIL). It allows only one thread to execute Python bytecode at a time. This means threads can’t truly run in parallel for CPU-heavy work:

⚠️ The GIL in plain English:

✓ Threads HELP with

Waiting for network / files
Sleeping (I/O-bound work)
Running C extensions that release the GIL (like NumPy)

✗ Threads DON'T help with

Heavy calculations in Python
Sorting huge lists
CPU-bound number crunching
(Use multiprocessing instead)

ThreadPoolExecutor — The Modern Way

Instead of managing threads manually, use concurrent.futures.ThreadPoolExecutor. It handles creating, starting, and joining threads for you:

python

from concurrent.futures import ThreadPoolExecutor
import time

def fetch_price(stock):
    time.sleep(1)                      # pretend API call
    return {"INFY": 1800, "TCS": 3900, "WIPRO": 450}[stock]

stocks = ["INFY", "TCS", "WIPRO"]

# max_workers = number of threads in the pool
with ThreadPoolExecutor(max_workers=3) as executor:
    prices = list(executor.map(fetch_price, stocks))

for stock, price in zip(stocks, prices):
    print(f"{stock}: ₹{price}")
# All 3 fetched in ~1s instead of 3s

Thread Safety — Protecting Shared Data

When threads share data, race conditions can corrupt it. Use a Lock to ensure only one thread changes shared data at a time:

python

import threading

counter = 0
lock    = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:          # only one thread at a time
            counter += 1

threads = [threading.Thread(target=increment) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

print(counter)   # 500000 — correct! (without lock: unpredictable)

"Threading shines for I/O-bound work: downloading files, querying APIs, reading databases. For CPU-heavy work, reach for multiprocessing. And for I/O in modern async code, use asyncio."

— Shurai

🧠 Quiz — Q1

What does thread.join() do?

🧠 Quiz — Q2

What is the GIL?

🧠 Quiz — Q3

You have 10 API calls that each take 1 second. Roughly how long does ThreadPoolExecutor(max_workers=10) take?

🧠 Quiz — Q4

Why should you use a Lock when multiple threads modify the same variable?