Simplified Math for Deep Learning .
Math you won’t fear for DLp.
Deep learning is like driving a car in the sense that you don’t need to know how the engine works to drive it. However, even to get along with deep learning, even reading a simple library API requires some basic math. This blog aims to discuss with the reader the basic math of deep learning using PyTorch.
LINEAR ALGEBRA
Scalar: Simply a number (0 dimensional)
scalar = torch.tensor(1)
scalar.ndim # outputs 0 as scalar
Vector: A vector is a one-dimensional array of numbers (1 dimensional).
vector = torch.tensor([1,2])
vector.ndim # outputs 1 as vector
Matrix: A matrix is a rectangular arrangement of numbers. (2nd dimensional)
matrix = torch.tensor([[1,2,3],[4,5,6]])
matrix.ndim # outputs 2 as Matrix
Tensor: A tensor is a generalization of matrices and vectors to any number of dimensions. (more than two dimensions)
tensor = torch.arange(27).reshape(3,3,3) # returns a tensor from 0 to 27
tensor.ndim # outputs 3
tensor1 = torch.tensor(40).reshape(2,2,2,5)
tensor1.ndim # 4
The basic linear algebra operations that are essential for deep learning are addition, subtraction, element-wise multiplication, and matrix multiplication.
- Addition and subtraction of vectors and matrices are straightforward and intuitive. For example, to add two vectors, we simply add the corresponding elements of each vector. To add two matrices, we add the corresponding elements of each matrix.
import torch
torch.manual_seed(123) #simply used for making code similar
# for vector addition
vec_1 = torch.rand(3) # tensor([0.2961, 0.5166, 0.2517])
vec_2 = torch.rand(3) # tensor([0.6886, 0.0740, 0.8665])
print(vec_1+vec_2) # tensor([0.9847, 0.5905, 1.1182])
- Element-wise multiplication of vectors and matrices is also straightforward. To element-wise multiply two vectors, we simply multiply the corresponding elements of each vector. To element-wise multiply two matrices, we multiply the corresponding elements of each matrix. Similarly for division.
import torch
torch.manual_seed(123) #simply used for making code similar
# for vector addition
vec_1 = torch.rand(3) # tensor([0.2961, 0.5166, 0.2517])
vec_2 = torch.rand(3) # tensor([0.6886, 0.0740, 0.8665])
print(vec_1*vec_2) # tensor([0.2039, 0.0382, 0.2181])
- Matrix multiplication:- is a more complex operation, but it is also one of the most important operations in deep learning. Matrix multiplication is used to combine vectors and matrices in order to perform various transformations, such as linear regression, classification, and convolution.
Matrix multiplication is like combining two matrices by multiplying the corresponding elements from each row of the first matrix with the corresponding elements from each column of the second matrix and then adding the products together.
MAT3 = torch.zeros(3,3)
MAT1 = torch.arange(0,9).reshape(3,3)
MAT2 = torch.arange(9,18).reshape(3,3)
# in pytorch we use @ for matrix multiply
MAT3 = MAT1 @ MAT2
MAT3
# tensor([[ 96, 126, 156],
# [123, 162, 201],
# [150, 198, 246]])
CALCULUS
Differentiation is the Only Prerequisite
In calculus, the fundamental concept we need to grasp is differentiation, which is essentially an approximation process.
δ𝑓 = 𝑓’δ𝑥
The above equation may appear complex, but it’s not. Here, δ represents a change. So, the meaning of the equation is as follows:
“Change in function” = “derivative” × “change in the independent variable.”
Note that the derivative acts as a linear approximator, allowing us to create the best possible linear approximation of a non-linear function. It provides us with the steepest increase at a point.
In deep learning, we often subtract the derivative (referred to as the gradient in jargon) because the derivative gives us the steepest increase. By subtracting it, we obtain the steepest decrease, which is essential for minimizing the loss.
In fact, the term “gradient” is used, which is simply a list of derivatives when dealing with more than one parameter. The above equation can also be written as:
∇𝑓 = 𝑓’∇𝑥
That’s all you need.