Numpy - agastya2002/IECSE-ML-Winter-2020 GitHub Wiki

I hope you guys are now comfortable with coding in Python. We will now move on to one of Python's most used libraries for machine learning - NumPy

NumPy

NumPy stands for Numerical Python and it is a core scientific computing library in Python. It provides efficient multi-dimensional array objects and various operations to work with these array objects.

Here, you learn about

  1. Installing NumPy
  2. Creating arrays in NumPy
  3. Basic operations on NumPy arrays

Installing Numpy

  1. Mac and Linux users can install NumPy via pip command:
pip install numpy
  1. Windows users can install NumPy from the command prompt by executing the following command
python -m pip install numpy

Note: If you are working on Anaconda, you do not need to install NumPy as it is already installed with Anaconda. Nevertheless you can install any package/library in Anaconda via command:

conda install name_of_the_package  
 conda install numpy

To use Numpy library in our program all you need to do is to import it.

import numpy as np

NumPy Basics

alt text

The Basics of NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.

We first start by importing the numpy library to be used by our program.

import numpy as np

as np - implies that we can use the name np instead of writing numpy while invoking numpy functions, valid throughout our program.

Arrays

In Python all arrays are zero indexed, i.e., the first element of the array has index 0.

a = np.array([1, 2, 3])
print(type(a))
print(a.shape)
print(a[0], a[1], a[2])
a[0] = 5
print(a)
<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]

Above we have created a one-dimensional array, also known as vector.

  • One important point about vectors is that their shape is represented as ( no_of_elements,) .
  • Elements are indexed by providing a single index in square brackets after the array name array_name[index].

Note - the type() function returns the datatype of the variable. eg:

a = 10
print(type(a))
<class 'int'>

Here we are creating a two-dimensional array,

  • 2 dimensions because the its shape consists of 2 dimensions (no_of_rows , no_of_columns)
  • also can be referred to as a matrix
  • elements accessed by providing two index values array_name[row_index, column_index]OR array_name[row_index][column_index]
b = np.array([1, 2, 3], [4, 5, 6](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2,-3],-[4,-5,-6))
print(b.shape)
print(b[0, 0], b[0, 1], b[1, 0])
(2, 3) 
1 2 4

To create an array of with all elements initialised to zero:

a = np.zeros((2, 2))
print(a)
[[0. 0.]
 [0. 0.]]

To create an array of with all elements initialised to one:

b = np.ones((1, 2))
print(b)
[1. 1.](/agastya2002/IECSE-ML-Winter-2020/wiki/1.-1.)

To To create an array of with all elements initialised to a particular value:

c = np.full((2, 2), 7)
print(c)
[[7 7]
 [7 7]]

To create an identity matrix - square matrix with diagonal elements 1 and rest as 0.

d = np.eye(2)#you can substitue 2 with m, where m is the order of the matrix.
print(d)
[[1. 0.]
 [0. 1.]]

To create an array of random floating numbers in the range [0,1), we use np.random.random(), where we pass a tuple consisting of the dimensions of the array.

e = np.random.random((2, 2))
print(e)
[[0.27331253 0.32892387]
 [0.91536591 0.50864343]]

Array indexing

a = np.array([1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2,-3,-4],-[5,-6,-7,-8],-[9,-10,-11,-12))
print(a)
print(a[0, 1])
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
2

To access a range of/multiple array elements, we use slicing. The colon : operator is used for slicing.

  • starting_index : end_index accesses elements which have the index in between [starting_index, end_index-1]
  • If starting_index is left blank, it is assumed to be zero.
  • If ending_index is left blank, it is assumed to be the last index
  • Therefore just specifying : means all indexes from [0, last_index]
b = a[:2, 1:3]
print(b)
[[2 3]
 [6 7]]

Note: Notice how changing a value in b also changes a value in a

b[0, 0] = 77
print(a[0, 1])
77

If you want to create a copy of an array, use

c = np.array(a)
c[1,0]=80
print(a)
[[ 1 78  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Therefore any changes in c will not be reflected in a

Note: See how by slightly modifying slices we can obtain a vector or a 2d matrix that can be a column or a row matrix.

a = np.array([1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2,-3,-4],-[5,-6,-7,-8],-[9,-10,-11,-12))
row_r1 = a[1, :] # row 1, with all the columns
row_r2 = a[1:2, :] # row 1, with all the columns
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)
[5 6 7 8] (4,)
[5 6 7 8](/agastya2002/IECSE-ML-Winter-2020/wiki/5-6-7-8) (1, 4)
col_r1 = a[:, 1] # all the rows of the first column
col_r2 = a[:, 1:2] # all the rows of the first column
print(col_r1, col_r1.shape)
print(col_r2, col_r2.shape)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)

A few more indexing techniques,

a = np.array([1, 2], [3, 4], [5, 6](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2],-[3,-4],-[5,-6))
print(a[0, 1, 2], [0, 1, 0](/agastya2002/IECSE-ML-Winter-2020/wiki/0,-1,-2],-[0,-1,-0))
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))
print(a[0, 0], [1, 1](/agastya2002/IECSE-ML-Winter-2020/wiki/0,-0],-[1,-1))
print(np.array([a[0, 1], a[0, 1]]))
[1 4 5]
[1 4 5]
[2 2]
[2 2]

np.arange()

arange([start,] stop[, step,][, dtype]) : Returns an array with evenly spaced elements as per the interval. The interval mentioned is half opened i.e. [Start, Stop)

Parameters :

start : [optional] start of interval range. By default start = 0 stop : end of interval range step : [optional] step size of interval. By default step size = 1,
For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. dtype : type of output array

np.arange(4)
array([0, 1, 2, 3])
a = np.array([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2,-3],-[4,-5,-6],-[7,-8,-9],-[10,-11,-12))
print(a)
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
b = np.array([0, 2, 0, 1])
print(a[np.arange(4), b])
[ 1  6  7 11]
a[np.arange(4), b] += 10
print(a)
[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]

Advanced operations: Performing logical operations on an array, and using them for indexing

a = np.array([1, 2], [3, 4], [5, 6](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2],-[3,-4],-[5,-6))
book_idx = (a > 2)
print(book_idx)
print(a[book_idx])
print(a[a > 2])
[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]

Datatypes

Here is link to all the supported numpy datatypes: https://numpy.org/devdocs/user/basics.types.html

x = np.array([1, 2])
print(x.dtype)
int64
x = np.array([1.0, 2.0])
print(x.dtype)
float64
x = np.array([1, 2], dtype=np.int64)
print(x.dtype)
int64

Array math

Addition: Element wise

x = np.array([1, 2], [3, 4](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2],-[3,-4), dtype=np.float64)
y = np.array([5, 6], [7, 8](/agastya2002/IECSE-ML-Winter-2020/wiki/5,-6],-[7,-8), dtype=np.float64)
print(x + y)
print(np.add(x, y))
[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]

Subtraction: Element wise

print(x - y)
print(np.subtract(x, y))
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]

Multiplication: Element wise

print(x * y)
print(np.multiply(x, y))
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]

Division: Element wise

print(x / y)
print(np.divide(x, y))
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]

Square Root: Element wise

print(np.sqrt(x))
[[1.         1.41421356]
 [1.73205081 2.        ]]

Dot Multiplication:

  • This function returns the dot product of two arrays.
  • For 2-D arrays, it is the equivalent to matrix multiplication.
  • For 1-D arrays, it is the inner product of the vectors.
x = np.array([1, 2], [3, 4](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2],-[3,-4))
y = np.array([5, 6], [7, 8](/agastya2002/IECSE-ML-Winter-2020/wiki/5,-6],-[7,-8))

v = np.array([9, 10])
w = np.array([11, 12])

# dot multiplication of 1d arrays
print(v.dot(w))# 9*11 + 10 * 12
print(np.dot(v, w))# 9*11 + 10 * 12
219
219

Matrix multiplication of 2-D arrays

print(x.dot(v))
print(np.dot(x, v))
[29 67]
[29 67]
print(x.dot(y))
print(np.dot(x, y))
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]

np.sum()

numpy.sum(arr, axis, dtype, out) : This function returns the sum of array elements over the specified axis.

Parameters :
arr : input array.
axis : axis along which we want to calculate the sum value. Otherwise, it will consider arr to be flattened(works on all the axis). axis = 0 means along the column and axis = 1 means working along the row.
out : Different array in which we want to place the result. The array must have same dimensions as expected output. Default is None.
initial : [scalar, optional] Starting value of the sum.

Return : Sum of the array elements (a scalar value if axis is none) or array with sum values along the specified axis.

x = np.array([1, 2], [3, 4](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2],-[3,-4))
print(np.sum(x))# sum of all elements
print(np.sum(x, axis=0))# sum along the column
print(np.sum(x, axis=1))# sum along the row
10 
[4 6] 
[3 7]

Transpose of a matrix

x = np.array([1, 2], [3, 4](/agastya2002/IECSE-ML-Winter-2020/wiki/1,-2],-[3,-4))
print(x)
print(x.T)
[[1 2]
 [3 4]]
[[1 3]
 [2 4]]

Note: Transposing a 1-D array returns an unchanged view of the original array.

v = np.array([1, 2, 3])
print(v)
print(v.T)
[1 2 3]
[1 2 3]

np.reshape()

Used to give a new shape to an array without changing its data

#reshapes array consisting of numbers from 1 to 9 into a 3 by 3 2-D array
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3]
 [4 5 6]
 [7 8 9]]

np.concatenate()

numpy.concatenate(_(a1_, _a2_, _...)_, _axis=0_, _out=None_) Join a sequence of arrays along an existing axis. Parameters: a1, a2, … : sequence of array_like The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default). axis : int, optional The axis along which the arrays will be joined. If axis is None, arrays are flattened before use. Default is 0. out : ndarray, optional If provided, the destination to place the result. The shape must be correct, matching that of what concatenate would have returned if no out argument were specified.

Returns: res : ndarray The concatenated array.

A = np.ones((2,3))
B = np.random.random((3,2))
print(A.T.shape, B.shape)
C = np.concatenate((A.T,B), axis=1) #arrays joined along the rows
D = np.concatenate((A.T,B), axis=0) #arrays joined along the columns
print(C)
print(D)
(3, 2) (3, 2)
[[1.         1.         0.19858947 0.99155416]
 [1.         1.         0.47833765 0.82944422]
 [1.         1.         0.62190316 0.37862048]]
[[1.         1.        ]
 [1.         1.        ]
 [1.         1.        ]
 [0.19858947 0.99155416]
 [0.47833765 0.82944422]
 [0.62190316 0.37862048]]

Concatenating Vectors


a = np.ones(4)
b = np.array([2,3,4,5])
X = np.hstack([a,b]) # stack horizontally
Y = np.vstack([a,b]) # stack vertically
print(X)
print(Y)
[1. 1. 1. 1. 2. 3. 4. 5.]
[[1. 1. 1. 1.]
 [2. 3. 4. 5.]]

Additional useful NumPy functions

There are several numpy functions to perform mathematical operations, which can be performed over an entire numpy array to. eg:

np.exp()
np.log()
np.sin()
np.cos()

And many more...

Additional resources

This was just a small glimpse of the functionality of NumPy library. Although, the content that has been shown will be sufficient for proceeding further in ML, we encourage you all to further explore the library to make full use of its powerful functions! Here is the link to the official NumPy documentation: https://docs.scipy.org/doc/numpy/reference/ The above link is a great resource for solving any doubts you have in NumPy. Apart from that you can always approach us for any problems that you might encounter.