Scipy Stack - shivamvats/notes GitHub Wiki

References

Jake Vanderplas's book

Numpy

Broadcasting

Say you have a (2, 3) matrix

A = np.array([[1,2,3], [4,5,6]])

and you want to divide the first row by 10 and the second by 1. Ideally, you would expect A/d to work where d = np.array([10, 1]). But it doesn't because NumPy aligns the last dimension first and broadcasts along the remaining dimensions. So in this case, it expects the 1st dimension to match, i.e. if it gets a 1d vector, it expects it to be of size 3. Hence if d = np.array([10, 1, .1]), the operation works.

But that is clearly not what we want. To achieve that, we need to transpose A and make it (3, 2) and transpose the result, i.e. (A.T/d).T.

matmul vs dot: For N-d matrices, where N > 2, they are defined differently!! Check out Stackoverflow .

stack: Joins a sequence of arrays along a new axis (the dimensionality of the resulting array is larger).

column_stack: Add a 1D vector as a column to a 2D array like np.column_stack((a, b)).

concatenate: Joins a sequence of arrays along an existing axis.

newaxis: Add a new axis to an array like a[np.newaxis].

moveaxis: Move a specified axis to a new new location. Other axes don't move.

reshape: If you want to flatten only the first two dimensions (let's say), do a.reshape((-1, a.shape[-1])). Numpy allows us to use only one -1 per shape. It tells numpy to automatically deduce the right dimension size so that the array size match. In this case, the -1- dimension is calculated as the total size of the array divided by specified dimension size.

pad: Useful for padding a matrix. The most important/confusing argument is the shape argument. It is supposed to a tuple with n 2D tuples, where n is the number of axes in the array. For each axis, you specify a 2D tuple with (a, b) where a is the size of padding at the start of that axis and b is the size of padding at the end of data. The new size of that axis will be m + a + b, where m is the original size.

Eg:

A = np.array([[1,2,3], [3,4,5], [5,6,7]])
B = np.pad(A, ((1,1), (1,1)), 'constant')

This pads A with 0's on the boundary of the 2D array.

slicing: Numpy does not check bounds when slicing an array. Even when the indices are completely out of bounds, it just returns an empty array and does not raise an error.

expand_dims:

a = np.random.rand(5, 2)
b = np.expand_dims(a, axis=1)

The shape of b is (5, 1, 2).

Fancy Indexing

Refer to https://jakevdp.github.io/PythonDataScienceHandbook/02.07-fancy-indexing.html

Matrix

Special class of 2D ndarrays. The primary benefit is that A*B is the matrix product rather than element-wise multiplication.

Note: To do matrix multiplication of two nd-arrays, we need to use the dot function.