Benchmarking experiments for rotation compositions - adbugger/scipy GitHub Wiki

Benchmarking for rotation composition approaches

This table contains the benchmarks for the rotation composition process. There are 2 cases which we will support:

N rotations composed with N rotations. This case is straightforward as converting to dcm will only incur extra computation cost. The already implemented compose_quat function will suffice.
N rotations composed with a single rotation. For this we have 2 options:
- Broadcast the single quaternion to N x 4 and use compose_quat
- Prevent unnecessary copying by converting the single quaternion to appropriate 4 x 4 matrix and use np.dot() for a matrix vector multiplication.

Replies

Updated quaternion benchmarks without literate broadcast
Updated matrix approach to handle general case of N by N rotations
Will try np.einsum and np.matmul for composition of N by N but using np.matmul for the final implementation is probably not a good idea since it is not supported by numpy v1.8, and is still a preliminary function. (EDIT: given the current implementation where the ith row is being multiplied with the ith slice, I don't think np.matmul can be used)

Here is the updated quaternion approach:

def _compose_quat(p, q):
    product = np.empty_like(p)
    # Scalar part of result
    product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
    # Vector part of result
    product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
                      np.cross(p[:, :3], q[:, :3]))
    return product

def quat_composition(P, Q):
    return R.from_quaternion(_compose_quat(P._quat, Q._quat), normalized=True)

And here is the updated matrix approach to handle the general case of N by N compositions.

def _rep_mat(q):
    if q._single:
        num_quat = 1
        q = q[None, :]
    else:
        num_quat = q.shape[0]
    """
    (x, y, z, w) = Q.as_quaternion()
    q_mat = np.array([
        [w, -z, y, -x],
        [z, w, -x, -y],
        [-y, x, w, -z],
        [x,  y, z,  w],
    ])
    """
    q_mat = np.empty((num_quat, 4, 4))
    q_mat[:, 0, 0] = q_mat[:, 1, 1] = q_mat[:, 2, 2] = q_mat[:, 3, 3] = q[:, 3]  # w
    
    q_mat[:, 3, 0] = q_mat[:, 2, 1] = q[:, 0]  # x
    q_mat[:, 1, 2] = q_mat[:, 0, 3] = -q[:, 0]  # -x
    
    q_mat[:, 0, 2] = q_mat[:, 3, 1] = q[:, 1]  # y
    q_mat[:, 2, 0] = q_mat[:, 1, 3] = -q[:, 1]  # -y
    
    q_mat[:, 1, 0] = q_mat[:, 3, 2] = q[:, 2]  # z
    q_mat[:, 0, 1] = q_mat[:, 2, 3] = -q[:, 2]  # -z
    
    return q_mat

def mat_composition(P, Q, dot=False):  # dot = True
    p = P.as_quaternion()
    q_mat = _rep_mat(Q.as_quaternion())
    if q_mat.shape[0] != 1:
        # Multiply ith row of p with ith slice of q_mat
        result = np.einsum('...ij,...ijk->...ik', p, q_mat)
    else:
        if dot:
            result = np.dot(p, q_mat[0])
        else:
            result = np.einsum('...ij,...jk->...ik', p, q_mat[0])
    return R.from_quaternion(result, normalized=True)

Here are the benchmarks for the n by 1 case. They were generated with the following code:

q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)

n_p	n_q	Quaternion	Matrix (`np.einsum`)	Matrix (`np.dot`)
2	1	`79.8 µs ± 2.87 µs`	`17.8 µs ± 150 ns`	`15 µs ± 328 ns`
6	1	`81.9 µs ± 1.82 µs`	`18.4 µs ± 229 ns`	`15.2 µs ± 616 ns`
10	1	`79.6 µs ± 1.53 µs`	`19 µs ± 631 ns`	`15.6 µs ± 521 ns`
50	1	`88.6 µs ± 1.24 µs`	`20.9 µs ± 486 ns`	`16.5 µs ± 770 ns`
100	1	`87.7 µs ± 778 ns`	`22.2 µs ± 828 ns`	`17.2 µs ± 685 ns`
1000	1	`165 µs ± 2.05 µs`	`53.8 µs ± 1.6 µs`	`27.3 µs ± 938 ns`
10000	1	`882 µs ± 21 µs`	`336 µs ± 12 µs`	`113 µs ± 1.36 µs`
100000	1	`15.1 ms ± 649 µs`	`3.39 ms ± 37.8 µs`	`1.25 ms ± 171 µs`
1000000	1	`150 ms ± 6.89 ms`	`39.9 ms ± 2.08 ms`	`19.8 ms ± 1.55 ms`

RESULT: By removing the literate broadcast, we do shave off a few microseconds but overall np.dot() approach is still better.

Here are the timings for the multiple compositions case

n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
q = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)

n_p	n_q	Quaternion	Matrix (`np.einsum`)
2	2	`82.1 µs ± 5.4 µs`	`17.8 µs ± 425 ns`
6	6	`83.7 µs ± 3.37 µs`	`18.7 µs ± 569 ns`
10	10	`80.1 µs ± 3.03 µs`	`19.1 µs ± 603 ns`
50	50	`86.2 µs ± 2.25 µs`	`21 µs ± 501 ns`
100	100	`92.5 µs ± 5.75 µs`	`25.4 µs ± 1.41 µs`
1000	1000	`179 µs ± 6.81 µs`	`97.9 µs ± 769 ns`
10000	10000	`958 µs ± 5.25 µs`	`1.01 ms ± 18.3 µs`
100000	100000	`16.5 ms ± 893 µs`	`31.4 ms ± 2.26 ms`
1000000	1000000	`177 ms ± 2.85 ms`	`351 ms ± 3.15 ms`

[OLD]

Comments from a mentor

If it's not too much trouble I suggest you to try conversion and np.einsum (compare with np.matmul as well) when we compose N and N rotations.
I think doing literate broadcast of 1 quaternion is not a great idea. Try implementation without it, I guess that np.cross can do smart broadcasting inside.

Here I have benchmarked the second case only. Here is the quaternion approach

def quat_composition(P, Q):
    # p contains N rotations
    # q contains a single rotation: q._single == True
    # (Handle 2D single quaternion later)
    p = P.as_quaternion()
    q = np.broadcast_to(Q.as_quaternion(), p.shape)
    return R.from_quaternion(compose_quat(p, q), normalized=True)

def compose_quat(p, q):
    # p and q should have same shape (N, 4)
    product = np.empty_like(p)
    # Scalar part of result
    product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
    # Vector part of result
    product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
                      np.cross(p[:, :3], q[:, :3]))
    return product

and here is the matrix composition approach

def mat_composition(P, Q):
    # p contains N rotations
    # q contains a single rotation: q._single == True
    # (Handle 2D single quaternion later)
    p = P.as_quaternion()
    (x, y, z, w) = Q.as_quaternion()
    q_mat = np.array([
        [w, -z, y, -x],
        [z, w, -x, -y],
        [-y, x, w, -z],
        [x, y, z, w],
    ])
    return R.from_quaternion(np.dot(p, q_mat), normalized=True)

The functions were times using the %timeit command in a jupyter notebook:

q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2  # Variable for testing
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)

Here are the times:

n	Quaternion	Matrix
2	`95.8 µs ± 3.7 µs`	`8.67 µs ± 149 ns`
6	`90.9 µs ± 3.1 µs`	`8.75 µs ± 178 ns`
10	`89.1 µs ± 2.32 µs`	`9.02 µs ± 236 ns`
50	`97.6 µs ± 1.56 µs`	`10 µs ± 161 ns`
100	`99.1 µs ± 4.72 µs`	`10.3 µs ± 106 ns`
1000	`176 µs ± 2.35 µs`	`19.9 µs ± 138 ns`
10000	`932 µs ± 37.6 µs`	`108 µs ± 408 ns`

At least for multiple rotations composed with a single one, the matrix approach appears to be better.

Comments from a mentor

From the looks of it the Matrix approach is consistently and considerably better. Although you should eliminate the literate broadcasting of quaternions and check again.