Numpy - BKJackson/BKJackson

Initializing Numpy arrays and ndarrays

Create an array of 10 random integers under 100

import numpy as np
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)

Output: [51 92 14 71 60 20 82 86 74 74]

Binning Data

np.random.seed(42)
x = np.random.randn(100)

# Compute a histogram by hand
bins = np.linspace(-5, 5, 20)
counts = np.zeros_like(bins)

# find the appropriate bin for each x
i = np.searchsorted(bins, x)

# add 1 to each of these bins
np.add.at(counts, i, 1)

# plot the results
plt.plot(bins, counts, linestyle='steps');

The matplotlib version:

plt.hist(x, bins, histtype='step')

Create a 2-D xy function with color values on the z axis

z = f(x, y)

# x and y have 50 steps from 0 to 5
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]

z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

print(x.shape, y.shape)

Output: (50,) (50,1)

Make a grid plot in matplotlib:

%matplotlib inline
import matplotlib.pyplot as plt

plt.imshow(z, origin='lower', extent=[0, 5, 0, 5],
           cmap='viridis')
plt.colorbar();

Indexing and sampling from arrays

Choosing 20 random indices with no repeats from array X

mean = [0, 0]
cov = [[1, 2],
       [2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape
indices = np.random.choice(X.shape[0], 20, replace=False)
selection = X[indices]

Numpy ufuncs

The following table lists the arithmetic operators implemented in NumPy:

Operator	Equivalent ufunc	Description
`+`	`np.add`	Addition (e.g., `1 + 1 = 2`)
`-`	`np.subtract`	Subtraction (e.g., `3 - 2 = 1`)
`-`	`np.negative`	Unary negation (e.g., `-2`)
`*`	`np.multiply`	Multiplication (e.g., `2 * 3 = 6`)
`/`	`np.divide`	Division (e.g., `3 / 2 = 1.5`)
`//`	`np.floor_divide`	Floor division (e.g., `3 // 2 = 1`)
`**`	`np.power`	Exponentiation (e.g., `2 ** 3 = 8`)
`%`	`np.mod`	Modulus/remainder (e.g., `9 % 4 = 1`)

Numpy aggregation functions

For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:
print(big_array.min(), big_array.max(), big_array.sum())

Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. For example, we can find the minimum value within each column by specifying axis=0:
M.min(axis=0)

The function returns four values, corresponding to the four columns of numbers.

Similarly, we can find the maximum value within each row:
M.max(axis=1)

The following table provides a list of useful aggregation functions available in NumPy:

Function Name	NaN-safe Version	Description
`np.sum`	`np.nansum`	Compute sum of elements
`np.prod`	`np.nanprod`	Compute product of elements
`np.mean`	`np.nanmean`	Compute mean of elements
`np.std`	`np.nanstd`	Compute standard deviation
`np.var`	`np.nanvar`	Compute variance
`np.min`	`np.nanmin`	Find minimum value
`np.max`	`np.nanmax`	Find maximum value
`np.argmin`	`np.nanargmin`	Find index of minimum value
`np.argmax`	`np.nanargmax`	Find index of maximum value
`np.median`	`np.nanmedian`	Compute median of elements
`np.percentile`	`np.nanpercentile`	Compute rank-based statistics of elements
`np.any`	N/A	Evaluate whether any elements are true
`np.all`	N/A	Evaluate whether all elements are true

Numpy Broadcasting

Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

Rules of Broadcasting

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Working with Boolean Arrays in Numpy

How many non-zero values less than 6?
np.count_nonzero(x < 6)

Are there any values less than zero?
np.any(x < 0)

Are all values in each row less than 8?
np.all(x < 8, axis=1)

Are all values equal to 6?
np.all(x == 6)

Boolean bitwise logical operators

The following table summarizes the bitwise Boolean operators and their equivalent ufuncs:

Operator	Equivalent ufunc	Operator	Equivalent ufunc
`&`	`np.bitwise_and`	\|	`np.bitwise_or`
`^`	`np.bitwise_xor`	`~`	`np.bitwise_not`

Example of constructing a Boolean mask

# construct a mask of all rainy days
rainy = (inches > 0)

# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)

print("Median precip on rainy days in 2014 (inches):   ",
      np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches):  ",
      np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
      np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
      np.median(inches[rainy & ~summer]))

Output:
Median precip on rainy days in 2014 (inches): 0.19488188976377951
Median precip on summer days in 2014 (inches): 0.0
Maximum precip on summer days in 2014 (inches): 0.8503937007874016
Median precip on non-summer rainy days (inches): 0.20078740157480315

When to use `and` and `or` versus `&` and `|`

So remember this: and and or perform a single Boolean evaluation on an entire object, while & and | perform multiple Boolean evaluations on the content (the individual bits or bytes) of an object. For Boolean NumPy arrays, the latter is nearly always the desired operation.

Numpy structured arrays

Create a structured numpy array with strings, integers, and floats:

name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

Output: [('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

Output: [('Alice', 25, 55.0) ('Bob', 45, 85.5) ('Cathy', 37, 68.0) ('Doug', 19, 61.5)]

The handy thing with structured arrays is that you can now refer to values either by index or by name.

# Get all names
data['name']

# Get first row of data
data[0]  

# Get the name from the last row
data[-1]['name']  

# Get names where age is under 30
data[data['age'] < 30]['name']

A compound type can also be specified as a list of tuples:

np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

Shortened string format codes

The first (optional) character is < or >, which means "little endian" or "big endian," respectively, and specifies the ordering convention for significant bits.
The next character specifies the type of data: characters, bytes, ints, floating points, and so on (see the table below).
The last character or characters represents the size of the object in bytes.

Character	Description	Example
`'b'`	Byte	`np.dtype('b')`
`'i'`	Signed integer	`np.dtype('i4') == np.int32`
`'u'`	Unsigned integer	`np.dtype('u1') == np.uint8`
`'f'`	Floating point	`np.dtype('f8') == np.int64`
`'c'`	Complex floating point	`np.dtype('c16') == np.complex128`
`'S'`, `'a'`	String	`np.dtype('S5')`
`'U'`	Unicode string	`np.dtype('U') == np.str_`
`'V'`	Raw data (void)	`np.dtype('V') == np.void`

Numpy RecordArrays

NumPy also provides the np.recarray class, which is almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys.

data_rec = data.view(np.recarray)
data_rec.age

The downside is that for record arrays, there is some extra overhead involved in accessing the fields, even when using the same syntax.

%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

Output:
1000000 loops, best of 3: 241 ns per loop
100000 loops, best of 3: 4.61 µs per loop
100000 loops, best of 3: 7.27 µs per loop

Numpy - BKJackson/BKJackson_Wiki GitHub Wiki

Initializing Numpy arrays and ndarrays

Create an array of 10 random integers under 100

Binning Data

Create a 2-D xy function with color values on the z axis

Indexing and sampling from arrays

Numpy ufuncs

Numpy aggregation functions

Numpy Broadcasting

Rules of Broadcasting

Working with Boolean Arrays in Numpy

Boolean bitwise logical operators

Example of constructing a Boolean mask

When to use `and` and `or` versus `&` and `|`

Numpy structured arrays

Shortened string format codes

Numpy RecordArrays

⚠️ GitHub.com Fallback ⚠️

Numpy - BKJackson/BKJackson_Wiki GitHub Wiki

Initializing Numpy arrays and ndarrays

Create an array of 10 random integers under 100

Binning Data

Create a 2-D xy function with color values on the z axis

Indexing and sampling from arrays

Numpy ufuncs

Numpy aggregation functions

Numpy Broadcasting

Rules of Broadcasting

Working with Boolean Arrays in Numpy

Boolean bitwise logical operators

Example of constructing a Boolean mask

When to use and and or versus & and |

Numpy structured arrays

Shortened string format codes

Numpy RecordArrays

⚠️ **GitHub.com Fallback** ⚠️

When to use `and` and `or` versus `&` and `|`

⚠️ GitHub.com Fallback ⚠️