Benchmarking experiments for rotation compositions - adbugger/scipy GitHub Wiki

Benchmarking for rotation composition approaches

This table contains the benchmarks for the rotation composition process. There are 2 cases which we will support:

  • N rotations composed with N rotations. This case is straightforward as converting to dcm will only incur extra computation cost. The already implemented compose_quat function will suffice.
  • N rotations composed with a single rotation. For this we have 2 options:
    • Broadcast the single quaternion to N x 4 and use compose_quat
    • Prevent unnecessary copying by converting the single quaternion to appropriate 4 x 4 matrix and use np.dot() for a matrix vector multiplication.

Replies

  • Updated quaternion benchmarks without literate broadcast
  • Updated matrix approach to handle general case of N by N rotations
  • Will try np.einsum and np.matmul for composition of N by N but using np.matmul for the final implementation is probably not a good idea since it is not supported by numpy v1.8, and is still a preliminary function. (EDIT: given the current implementation where the ith row is being multiplied with the ith slice, I don't think np.matmul can be used)

Here is the updated quaternion approach:

def _compose_quat(p, q):
    product = np.empty_like(p)
    # Scalar part of result
    product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
    # Vector part of result
    product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
                      np.cross(p[:, :3], q[:, :3]))
    return product

def quat_composition(P, Q):
    return R.from_quaternion(_compose_quat(P._quat, Q._quat), normalized=True)

And here is the updated matrix approach to handle the general case of N by N compositions.

def _rep_mat(q):
    if q._single:
        num_quat = 1
        q = q[None, :]
    else:
        num_quat = q.shape[0]
    """
    (x, y, z, w) = Q.as_quaternion()
    q_mat = np.array([
        [w, -z, y, -x],
        [z, w, -x, -y],
        [-y, x, w, -z],
        [x,  y, z,  w],
    ])
    """
    q_mat = np.empty((num_quat, 4, 4))
    q_mat[:, 0, 0] = q_mat[:, 1, 1] = q_mat[:, 2, 2] = q_mat[:, 3, 3] = q[:, 3]  # w
    
    q_mat[:, 3, 0] = q_mat[:, 2, 1] = q[:, 0]  # x
    q_mat[:, 1, 2] = q_mat[:, 0, 3] = -q[:, 0]  # -x
    
    q_mat[:, 0, 2] = q_mat[:, 3, 1] = q[:, 1]  # y
    q_mat[:, 2, 0] = q_mat[:, 1, 3] = -q[:, 1]  # -y
    
    q_mat[:, 1, 0] = q_mat[:, 3, 2] = q[:, 2]  # z
    q_mat[:, 0, 1] = q_mat[:, 2, 3] = -q[:, 2]  # -z
    
    return q_mat

def mat_composition(P, Q, dot=False):  # dot = True
    p = P.as_quaternion()
    q_mat = _rep_mat(Q.as_quaternion())
    if q_mat.shape[0] != 1:
        # Multiply ith row of p with ith slice of q_mat
        result = np.einsum('...ij,...ijk->...ik', p, q_mat)
    else:
        if dot:
            result = np.dot(p, q_mat[0])
        else:
            result = np.einsum('...ij,...jk->...ik', p, q_mat[0])
    return R.from_quaternion(result, normalized=True)

Here are the benchmarks for the n by 1 case. They were generated with the following code:

q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
n_p n_q Quaternion Matrix (np.einsum) Matrix (np.dot)
2 1 79.8 µs ± 2.87 µs 17.8 µs ± 150 ns 15 µs ± 328 ns
6 1 81.9 µs ± 1.82 µs 18.4 µs ± 229 ns 15.2 µs ± 616 ns
10 1 79.6 µs ± 1.53 µs 19 µs ± 631 ns 15.6 µs ± 521 ns
50 1 88.6 µs ± 1.24 µs 20.9 µs ± 486 ns 16.5 µs ± 770 ns
100 1 87.7 µs ± 778 ns 22.2 µs ± 828 ns 17.2 µs ± 685 ns
1000 1 165 µs ± 2.05 µs 53.8 µs ± 1.6 µs 27.3 µs ± 938 ns
10000 1 882 µs ± 21 µs 336 µs ± 12 µs 113 µs ± 1.36 µs
100000 1 15.1 ms ± 649 µs 3.39 ms ± 37.8 µs 1.25 ms ± 171 µs
1000000 1 150 ms ± 6.89 ms 39.9 ms ± 2.08 ms 19.8 ms ± 1.55 ms

RESULT: By removing the literate broadcast, we do shave off a few microseconds but overall np.dot() approach is still better.

Here are the timings for the multiple compositions case

n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
q = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
n_p n_q Quaternion Matrix (np.einsum)
2 2 82.1 µs ± 5.4 µs 17.8 µs ± 425 ns
6 6 83.7 µs ± 3.37 µs 18.7 µs ± 569 ns
10 10 80.1 µs ± 3.03 µs 19.1 µs ± 603 ns
50 50 86.2 µs ± 2.25 µs 21 µs ± 501 ns
100 100 92.5 µs ± 5.75 µs 25.4 µs ± 1.41 µs
1000 1000 179 µs ± 6.81 µs 97.9 µs ± 769 ns
10000 10000 958 µs ± 5.25 µs 1.01 ms ± 18.3 µs
100000 100000 16.5 ms ± 893 µs 31.4 ms ± 2.26 ms
1000000 1000000 177 ms ± 2.85 ms 351 ms ± 3.15 ms

[OLD]

Comments from a mentor

  • If it's not too much trouble I suggest you to try conversion and np.einsum (compare with np.matmul as well) when we compose N and N rotations.
  • I think doing literate broadcast of 1 quaternion is not a great idea. Try implementation without it, I guess that np.cross can do smart broadcasting inside.

Here I have benchmarked the second case only. Here is the quaternion approach

def quat_composition(P, Q):
    # p contains N rotations
    # q contains a single rotation: q._single == True
    # (Handle 2D single quaternion later)
    p = P.as_quaternion()
    q = np.broadcast_to(Q.as_quaternion(), p.shape)
    return R.from_quaternion(compose_quat(p, q), normalized=True)

def compose_quat(p, q):
    # p and q should have same shape (N, 4)
    product = np.empty_like(p)
    # Scalar part of result
    product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
    # Vector part of result
    product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
                      np.cross(p[:, :3], q[:, :3]))
    return product

and here is the matrix composition approach

def mat_composition(P, Q):
    # p contains N rotations
    # q contains a single rotation: q._single == True
    # (Handle 2D single quaternion later)
    p = P.as_quaternion()
    (x, y, z, w) = Q.as_quaternion()
    q_mat = np.array([
        [w, -z, y, -x],
        [z, w, -x, -y],
        [-y, x, w, -z],
        [x, y, z, w],
    ])
    return R.from_quaternion(np.dot(p, q_mat), normalized=True)

The functions were times using the %timeit command in a jupyter notebook:

q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2  # Variable for testing
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)

Here are the times:

n Quaternion Matrix
2 95.8 µs ± 3.7 µs 8.67 µs ± 149 ns
6 90.9 µs ± 3.1 µs 8.75 µs ± 178 ns
10 89.1 µs ± 2.32 µs 9.02 µs ± 236 ns
50 97.6 µs ± 1.56 µs 10 µs ± 161 ns
100 99.1 µs ± 4.72 µs 10.3 µs ± 106 ns
1000 176 µs ± 2.35 µs 19.9 µs ± 138 ns
10000 932 µs ± 37.6 µs 108 µs ± 408 ns

At least for multiple rotations composed with a single one, the matrix approach appears to be better.

Comments from a mentor

From the looks of it the Matrix approach is consistently and considerably better. Although you should eliminate the literate broadcasting of quaternions and check again.