Numpy - BKJackson/BKJackson_Wiki GitHub Wiki

Initializing Numpy arrays and ndarrays

Create an array of 10 random integers under 100

import numpy as np
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)

Output: [51 92 14 71 60 20 82 86 74 74]

Binning Data

np.random.seed(42)
x = np.random.randn(100)

# Compute a histogram by hand
bins = np.linspace(-5, 5, 20)
counts = np.zeros_like(bins)

# find the appropriate bin for each x
i = np.searchsorted(bins, x)

# add 1 to each of these bins
np.add.at(counts, i, 1)

# plot the results
plt.plot(bins, counts, linestyle='steps');  

The matplotlib version:

plt.hist(x, bins, histtype='step')  

Create a 2-D xy function with color values on the z axis

z = f(x, y)

# x and y have 50 steps from 0 to 5
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]

z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

print(x.shape, y.shape)

Output: (50,) (50,1)

Make a grid plot in matplotlib:

%matplotlib inline
import matplotlib.pyplot as plt

plt.imshow(z, origin='lower', extent=[0, 5, 0, 5],
           cmap='viridis')
plt.colorbar();  

Indexing and sampling from arrays

Choosing 20 random indices with no repeats from array X

mean = [0, 0]
cov = [[1, 2],
       [2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape
indices = np.random.choice(X.shape[0], 20, replace=False)
selection = X[indices]

Numpy ufuncs

The following table lists the arithmetic operators implemented in NumPy:

Operator Equivalent ufunc Description
+ np.add Addition (e.g., 1 + 1 = 2)
- np.subtract Subtraction (e.g., 3 - 2 = 1)
- np.negative Unary negation (e.g., -2)
* np.multiply Multiplication (e.g., 2 * 3 = 6)
/ np.divide Division (e.g., 3 / 2 = 1.5)
// np.floor_divide Floor division (e.g., 3 // 2 = 1)
** np.power Exponentiation (e.g., 2 ** 3 = 8)
% np.mod Modulus/remainder (e.g., 9 % 4 = 1)

Numpy aggregation functions

For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:
print(big_array.min(), big_array.max(), big_array.sum())

Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. For example, we can find the minimum value within each column by specifying axis=0:
M.min(axis=0)

The function returns four values, corresponding to the four columns of numbers.

Similarly, we can find the maximum value within each row:
M.max(axis=1)

The following table provides a list of useful aggregation functions available in NumPy:

Function Name NaN-safe Version Description
np.sum np.nansum Compute sum of elements
np.prod np.nanprod Compute product of elements
np.mean np.nanmean Compute mean of elements
np.std np.nanstd Compute standard deviation
np.var np.nanvar Compute variance
np.min np.nanmin Find minimum value
np.max np.nanmax Find maximum value
np.argmin np.nanargmin Find index of minimum value
np.argmax np.nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any N/A Evaluate whether any elements are true
np.all N/A Evaluate whether all elements are true

Numpy Broadcasting

Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

Rules of Broadcasting

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

  • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
  • Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Working with Boolean Arrays in Numpy

How many non-zero values less than 6?
np.count_nonzero(x < 6)

Are there any values less than zero?
np.any(x < 0)

Are all values in each row less than 8?
np.all(x < 8, axis=1)

Are all values equal to 6?
np.all(x == 6)

Boolean bitwise logical operators

The following table summarizes the bitwise Boolean operators and their equivalent ufuncs:

Operator Equivalent ufunc Operator Equivalent ufunc
& np.bitwise_and | np.bitwise_or
^ np.bitwise_xor ~ np.bitwise_not

Example of constructing a Boolean mask

# construct a mask of all rainy days
rainy = (inches > 0)

# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)

print("Median precip on rainy days in 2014 (inches):   ",
      np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches):  ",
      np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
      np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
      np.median(inches[rainy & ~summer]))

Output:
Median precip on rainy days in 2014 (inches): 0.19488188976377951
Median precip on summer days in 2014 (inches): 0.0
Maximum precip on summer days in 2014 (inches): 0.8503937007874016
Median precip on non-summer rainy days (inches): 0.20078740157480315

When to use and and or versus & and |

So remember this: and and or perform a single Boolean evaluation on an entire object, while & and | perform multiple Boolean evaluations on the content (the individual bits or bytes) of an object. For Boolean NumPy arrays, the latter is nearly always the desired operation.

Numpy structured arrays

Create a structured numpy array with strings, integers, and floats:

name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

Output: [('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)  

Output: [('Alice', 25, 55.0) ('Bob', 45, 85.5) ('Cathy', 37, 68.0) ('Doug', 19, 61.5)]

The handy thing with structured arrays is that you can now refer to values either by index or by name.

# Get all names
data['name']

# Get first row of data
data[0]  

# Get the name from the last row
data[-1]['name']  

# Get names where age is under 30
data[data['age'] < 30]['name']  

A compound type can also be specified as a list of tuples:

np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])  

Shortened string format codes

The first (optional) character is < or >, which means "little endian" or "big endian," respectively, and specifies the ordering convention for significant bits.
The next character specifies the type of data: characters, bytes, ints, floating points, and so on (see the table below).
The last character or characters represents the size of the object in bytes.

Character Description Example
'b' Byte np.dtype('b')
'i' Signed integer np.dtype('i4') == np.int32
'u' Unsigned integer np.dtype('u1') == np.uint8
'f' Floating point np.dtype('f8') == np.int64
'c' Complex floating point np.dtype('c16') == np.complex128
'S', 'a' String np.dtype('S5')
'U' Unicode string np.dtype('U') == np.str_
'V' Raw data (void) np.dtype('V') == np.void

Numpy RecordArrays

NumPy also provides the np.recarray class, which is almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys.

data_rec = data.view(np.recarray)
data_rec.age

The downside is that for record arrays, there is some extra overhead involved in accessing the fields, even when using the same syntax.

%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

Output:
1000000 loops, best of 3: 241 ns per loop
100000 loops, best of 3: 4.61 µs per loop
100000 loops, best of 3: 7.27 µs per loop

⚠️ **GitHub.com Fallback** ⚠️