Benchmarking experiments for rotation compositions - adbugger/scipy GitHub Wiki
Benchmarking for rotation composition approaches
This table contains the benchmarks for the rotation composition process. There are 2 cases which we will support:
N
rotations composed withN
rotations. This case is straightforward as converting to dcm will only incur extra computation cost. The already implementedcompose_quat
function will suffice.N
rotations composed with a single rotation. For this we have 2 options:- Broadcast the single quaternion to
N x 4
and usecompose_quat
- Prevent unnecessary copying by converting the single quaternion to appropriate
4 x 4
matrix and usenp.dot()
for a matrix vector multiplication.
- Broadcast the single quaternion to
Replies
- Updated quaternion benchmarks without literate broadcast
- Updated matrix approach to handle general case of
N
byN
rotations - Will try
np.einsum
andnp.matmul
for composition ofN
byN
but usingnp.matmul
for the final implementation is probably not a good idea since it is not supported by numpyv1.8
, and is still a preliminary function. (EDIT: given the current implementation where theith
row is being multiplied with theith
slice, I don't thinknp.matmul
can be used)
Here is the updated quaternion approach:
def _compose_quat(p, q):
product = np.empty_like(p)
# Scalar part of result
product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
# Vector part of result
product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
np.cross(p[:, :3], q[:, :3]))
return product
def quat_composition(P, Q):
return R.from_quaternion(_compose_quat(P._quat, Q._quat), normalized=True)
And here is the updated matrix approach to handle the general case of N by N
compositions.
def _rep_mat(q):
if q._single:
num_quat = 1
q = q[None, :]
else:
num_quat = q.shape[0]
"""
(x, y, z, w) = Q.as_quaternion()
q_mat = np.array([
[w, -z, y, -x],
[z, w, -x, -y],
[-y, x, w, -z],
[x, y, z, w],
])
"""
q_mat = np.empty((num_quat, 4, 4))
q_mat[:, 0, 0] = q_mat[:, 1, 1] = q_mat[:, 2, 2] = q_mat[:, 3, 3] = q[:, 3] # w
q_mat[:, 3, 0] = q_mat[:, 2, 1] = q[:, 0] # x
q_mat[:, 1, 2] = q_mat[:, 0, 3] = -q[:, 0] # -x
q_mat[:, 0, 2] = q_mat[:, 3, 1] = q[:, 1] # y
q_mat[:, 2, 0] = q_mat[:, 1, 3] = -q[:, 1] # -y
q_mat[:, 1, 0] = q_mat[:, 3, 2] = q[:, 2] # z
q_mat[:, 0, 1] = q_mat[:, 2, 3] = -q[:, 2] # -z
return q_mat
def mat_composition(P, Q, dot=False): # dot = True
p = P.as_quaternion()
q_mat = _rep_mat(Q.as_quaternion())
if q_mat.shape[0] != 1:
# Multiply ith row of p with ith slice of q_mat
result = np.einsum('...ij,...ijk->...ik', p, q_mat)
else:
if dot:
result = np.dot(p, q_mat[0])
else:
result = np.einsum('...ij,...jk->...ik', p, q_mat[0])
return R.from_quaternion(result, normalized=True)
Here are the benchmarks for the n by 1
case. They were generated with the following code:
q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
n_p | n_q | Quaternion | Matrix (np.einsum ) |
Matrix (np.dot ) |
---|---|---|---|---|
2 | 1 | 79.8 µs ± 2.87 µs |
17.8 µs ± 150 ns |
15 µs ± 328 ns |
6 | 1 | 81.9 µs ± 1.82 µs |
18.4 µs ± 229 ns |
15.2 µs ± 616 ns |
10 | 1 | 79.6 µs ± 1.53 µs |
19 µs ± 631 ns |
15.6 µs ± 521 ns |
50 | 1 | 88.6 µs ± 1.24 µs |
20.9 µs ± 486 ns |
16.5 µs ± 770 ns |
100 | 1 | 87.7 µs ± 778 ns |
22.2 µs ± 828 ns |
17.2 µs ± 685 ns |
1000 | 1 | 165 µs ± 2.05 µs |
53.8 µs ± 1.6 µs |
27.3 µs ± 938 ns |
10000 | 1 | 882 µs ± 21 µs |
336 µs ± 12 µs |
113 µs ± 1.36 µs |
100000 | 1 | 15.1 ms ± 649 µs |
3.39 ms ± 37.8 µs |
1.25 ms ± 171 µs |
1000000 | 1 | 150 ms ± 6.89 ms |
39.9 ms ± 2.08 ms |
19.8 ms ± 1.55 ms |
RESULT: By removing the literate broadcast, we do shave off a few microseconds but overall np.dot()
approach is still better.
Here are the timings for the multiple compositions case
n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
q = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
n_p | n_q | Quaternion | Matrix (np.einsum ) |
---|---|---|---|
2 | 2 | 82.1 µs ± 5.4 µs |
17.8 µs ± 425 ns |
6 | 6 | 83.7 µs ± 3.37 µs |
18.7 µs ± 569 ns |
10 | 10 | 80.1 µs ± 3.03 µs |
19.1 µs ± 603 ns |
50 | 50 | 86.2 µs ± 2.25 µs |
21 µs ± 501 ns |
100 | 100 | 92.5 µs ± 5.75 µs |
25.4 µs ± 1.41 µs |
1000 | 1000 | 179 µs ± 6.81 µs |
97.9 µs ± 769 ns |
10000 | 10000 | 958 µs ± 5.25 µs |
1.01 ms ± 18.3 µs |
100000 | 100000 | 16.5 ms ± 893 µs |
31.4 ms ± 2.26 ms |
1000000 | 1000000 | 177 ms ± 2.85 ms |
351 ms ± 3.15 ms |
[OLD]
Comments from a mentor
- If it's not too much trouble I suggest you to try conversion and
np.einsum
(compare withnp.matmul
as well) when we composeN
andN
rotations. - I think doing literate broadcast of 1 quaternion is not a great idea. Try implementation without it, I guess that
np.cross
can do smart broadcasting inside.
Here I have benchmarked the second case only. Here is the quaternion
approach
def quat_composition(P, Q):
# p contains N rotations
# q contains a single rotation: q._single == True
# (Handle 2D single quaternion later)
p = P.as_quaternion()
q = np.broadcast_to(Q.as_quaternion(), p.shape)
return R.from_quaternion(compose_quat(p, q), normalized=True)
def compose_quat(p, q):
# p and q should have same shape (N, 4)
product = np.empty_like(p)
# Scalar part of result
product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
# Vector part of result
product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
np.cross(p[:, :3], q[:, :3]))
return product
and here is the matrix composition approach
def mat_composition(P, Q):
# p contains N rotations
# q contains a single rotation: q._single == True
# (Handle 2D single quaternion later)
p = P.as_quaternion()
(x, y, z, w) = Q.as_quaternion()
q_mat = np.array([
[w, -z, y, -x],
[z, w, -x, -y],
[-y, x, w, -z],
[x, y, z, w],
])
return R.from_quaternion(np.dot(p, q_mat), normalized=True)
The functions were times using the %timeit
command in a jupyter notebook
:
q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2 # Variable for testing
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
Here are the times:
n | Quaternion | Matrix |
---|---|---|
2 | 95.8 µs ± 3.7 µs |
8.67 µs ± 149 ns |
6 | 90.9 µs ± 3.1 µs |
8.75 µs ± 178 ns |
10 | 89.1 µs ± 2.32 µs |
9.02 µs ± 236 ns |
50 | 97.6 µs ± 1.56 µs |
10 µs ± 161 ns |
100 | 99.1 µs ± 4.72 µs |
10.3 µs ± 106 ns |
1000 | 176 µs ± 2.35 µs |
19.9 µs ± 138 ns |
10000 | 932 µs ± 37.6 µs |
108 µs ± 408 ns |
At least for multiple rotations composed with a single one, the matrix approach appears to be better.
Comments from a mentor
From the looks of it the Matrix approach is consistently and considerably better. Although you should eliminate the literate broadcasting of quaternions and check again.