Introduction to NumPy

Fast, compact arrays for numerical computing — the foundation of data science, machine learning, and scientific Python.

"NumPy is the reason Python became the language of science. A Python list of a million numbers takes 400MB. A NumPy array takes 8MB — and runs 100x faster."

— Shurai

What is NumPy and Why Use It?

NumPy (Numerical Python) is a library for fast number crunching. It stores numbers in a compact ndarray (n-dimensional array) and runs operations on entire arrays at once, in highly optimised C code:

🐍 Plain Python list

Stores Python objects
Each item has overhead
Loops are slow
No built-in math ops

⚡ NumPy array

Stores raw numbers
Tiny memory footprint
Operations are vectorized (no loop needed)
Built-in maths, stats, linear algebra

terminal — install once

pip install numpy

Creating Arrays

python

import numpy as np    # np is the universal alias

# From a list
a = np.array([1, 2, 3, 4, 5])
print(a)         # [1 2 3 4 5]
print(a.dtype)   # int64  — all elements are the same type
print(a.shape)   # (5,)   — 5 elements, 1 dimension

# Convenience creation functions
zeros  = np.zeros(5)          # [0. 0. 0. 0. 0.]
ones   = np.ones(3)           # [1. 1. 1.]
rng    = np.arange(0, 10, 2)  # [0 2 4 6 8]  — like range() but array
linsp  = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1.] — 5 evenly spaced
rand   = np.random.rand(4)    # 4 random floats between 0 and 1

Vectorized Operations — No Loops Needed

The big win with NumPy: operations apply to every element at once, without writing a loop:

python

scores = np.array([55, 72, 88, 91, 63])

# Apply to all elements — no for loop!
print(scores + 5)         # [60 77 93 96 68]  — add 5 to every score
print(scores * 2)         # [110 144 176 182 126]
print(scores >= 75)       # [False True True True False]
print(scores[scores >= 75]) # [72 88 91] — boolean indexing

# Math on two arrays — element-wise
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
print(a + b)    # [11 22 33]
print(a * b)    # [10 40 90]

Useful Built-in Functions

python

data = np.array([4, 7, 2, 9, 1, 5, 8])

print(np.sum(data))      # 36
print(np.mean(data))     # 5.142...
print(np.std(data))      # standard deviation
print(np.min(data))      # 1
print(np.max(data))      # 9
print(np.sort(data))     # [1 2 4 5 7 8 9]
print(np.argmax(data))   # 3  — index of the max value

2D Arrays — Matrices

python

# 3 rows, 3 columns
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(matrix.shape)           # (3, 3)
print(matrix[0])              # [1 2 3]  — first row
print(matrix[1, 2])           # 6        — row 1, col 2
print(matrix[:, 1])           # [2 5 8]  — all rows, col 1

# Reshape: change dimensions without changing data
flat = np.arange(12)          # [0 1 2 ... 11]
grid = flat.reshape(3, 4)     # 3 rows, 4 cols
print(grid)

Real Example — Student Grade Analysis

python

import numpy as np

# Each row = one student, each column = one subject
grades = np.array([
    [85, 92, 78],   # Riya
    [70, 65, 80],   # Arjun
    [95, 88, 91],   # Sneha
    [60, 72, 55],   # Vikram
])

# axis=1 means "across columns" (per student average)
student_avgs = np.mean(grades, axis=1)
print("Student averages:", student_avgs)
# [85.  71.67  91.33  62.33]

# axis=0 means "across rows" (per subject average)
subject_avgs = np.mean(grades, axis=0)
print("Subject averages:", subject_avgs)
# [77.5  79.25  76.]

# Boolean indexing: which students passed (avg >= 75)?
names  = np.array(["Riya", "Arjun", "Sneha", "Vikram"])
print("Passed:", names[student_avgs >= 75])
# Passed: ['Riya' 'Sneha']

"Once you get comfortable with NumPy arrays and vectorized operations, you'll never want to write a for loop over numbers again. The code is shorter, faster, and clearer."

— Shurai

🧠 Quiz — Q1

What is the main advantage of a NumPy array over a Python list for numbers?

🧠 Quiz — Q2

What does np.arange(0, 10, 2) return?

🧠 Quiz — Q3

scores = np.array([55, 72, 88]). What does scores + 5 return?

🧠 Quiz — Q4

In a 2D array m, what does m[:, 1] select?