Prob 0011 ‐ Largest Product in a Grid - maccergit/Euler GitHub Wiki

This is an exercise in moving around an array. One of the first things to figure out is how to store the data, as this can have a significant impact on performance.

01

A direct approach in Python is to store the data as a list of lists. Since lists can be accessed via indices, and we can also use slices to access pieces of lists, this is an easy approach. Also, since lists are commonly used in Python, list access is highly optimized. As a result, we see good performance with this direct approach :

The curve is interesting. Apparently, short lists are fast enough that the other logic is dominated by the fast list processing. As the lists get larger, performance degrades as we travers the lists. Then, as the lists get even larger, there are fewer and fewer of them to process (because we don't go beyond the edges of the array), and this overwhelms the extra processing of the larger lists.

02

Since this the data is a large numeric array, it would seem to be a good candidate for NumPy, which provides special support for array processing. However, experience has shown that without taking advantage of the vectorization provided by NumPy, simply replacing Python lists with NumPy arrays will actually decrease performance - any performance improvement is offset by the overhead of using NumPy. This is shown here, where the list of lists is replaced with a NumPy array, but only the minimum changes to the code to use NumPy have been made. Note the use of the same "list of lists" indexing into the NumPy array, as if it were still a list of lists :

03

As first step to improving the NumPy version is to replace the Python "list of lists" indexing with NumPy's array indexing - it's a small syntax change, but it appears to impact performance. The change is to turn :

[index1][index2]

to :

[index1, index2]

04

NumPy has built in support for finding diagonals of an array. Replacing the code that finds the indices for the diagonals with NumPy's built in support results in an interesting change in the performance characteristics. Note that since we don't want the full diagonal of the array, we use NumPy's ability to return a smaller array by using slices along each axis before finding the diagonal :

05

The solutions to this point have been iterating over the data element by element, finding the elements along the associated axes starting with that element (down, right, diagonal right-up, and diagonal right-down), with the logic for finding the data along each axis filtering out elements too close to the edge for that axis. However, iterating over the elements in this fashion doesn't make much use of the vectorization available in NumPy (we can use the slices to provide views into the data, and the -1 stride for the case where we need the data in reverse order). The product for each axis is also computed for each element, and we build a list of all the products and find the maximum in the list - again bypassing vectorization.

This solution is a first pass at vectorizing the NumPy implementation, extracting the data elements along each axis into a NumPy array, and then applying the NumPy prod() and max() methods across the array of results. However, we still are building a list using a list comprehension, and then converting the list to an array to vectorize the prod() and max() calls :

Note that we use a trick here to avoid the need to determine the edges of the data that "overflow" off the edge - we add a border of zeroes to allow us to treat the edge elements just like the interior elements. This does result in more processing, and a possible optimization would be to change the calculation bounds to filter out the edges to avoid overflowing the bounds.

06

Let's do some experimentation in the Python interactive shell to see if we can vectorize some more.

First, set up the test data :

>>> import numpy as np
>>> 
>>> testData = [
... [1, 2, 3],
... [4, 9, 6],
... [7, 8, 5]
... ]
>>> 
>>> testDataNP = np.array(testData)
>>> limit = 2
>>> dt = np.pad(testDataNP, limit - 1, 'constant', constant_values = 0)
>>> dt
array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 0],
       [0, 4, 9, 6, 0],
       [0, 7, 8, 5, 0],
       [0, 0, 0, 0, 0]])

Let's see what moving through the two indices gives us when we return entire rows/columns :

>>> dt[0, :]
array([0, 0, 0, 0, 0])
>>> dt[1, :]
array([0, 1, 2, 3, 0])
>>> dt[0]
array([0, 0, 0, 0, 0])
>>> dt[1]
array([0, 1, 2, 3, 0])

Using a wildcard on the second index, and changing the value for the first index, result in returning rows - in this case, the first row and the second row. Note that omitting the second index is the same as using ':' as a wildcard. I prefer to use the ':', as it is more obvious what we want. It's also more consistent when dealing with the other index to extract columns (throws a syntax error if you try to omit the first index) :

>>> dt[:, 0]
array([0, 0, 0, 0, 0])
>>> dt[:, 1]
array([0, 1, 4, 7, 0])
>>> dt[, 0]
  File "<stdin>", line 1
    dt[, 0]
       ^
SyntaxError: invalid syntax

So far, we are just treating this the same way as a list of lists. However, we can extract multiple rows/columns at the same time in NumPy :

>>> dt[0 : 2, :]
array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 0]])
>>> dt[1 : 3, :]
array([[0, 1, 2, 3, 0],
       [0, 4, 9, 6, 0]])
>>> dt[0 : 3, :]
array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 0],
       [0, 4, 9, 6, 0]])

Note that if we look at the columns of the results, we have the items that for the product for each element in the first row, second row, etc... - let's see if we can iterate over the array to extract all the pairs of rows :

>>> [dt[i : i + 2, :] for i in range(len(dt))]
[array([[0, 0, 0, 0, 0],
       [0, 1, 2, 3, 0]]), array([[0, 1, 2, 3, 0],
       [0, 4, 9, 6, 0]]), array([[0, 4, 9, 6, 0],
       [0, 7, 8, 5, 0]]), array([[0, 7, 8, 5, 0],
       [0, 0, 0, 0, 0]]), array([[0, 0, 0, 0, 0]])]

Hmm, that doesn't look right - or does it? It's a list, composed of arrays containing two rows each - it just prints weird. However, it's a Python list, and we want a NumPy array. Also, the last element is not a pair of rows - looks like we went too far on our range :

>>> np.array([dt[i : i + 2, :] for i in range(len(dt) - 1)])
array([[[0, 0, 0, 0, 0],
        [0, 1, 2, 3, 0]],

       [[0, 1, 2, 3, 0],
        [0, 4, 9, 6, 0]],

       [[0, 4, 9, 6, 0],
        [0, 7, 8, 5, 0]],

       [[0, 7, 8, 5, 0],
        [0, 0, 0, 0, 0]]])

Now let's see if we can get out products. Obviously, a plain prod() that multiplies all the elements will return 0. Let's experiment with specifying the axis :

>>> np.array([dt[i : i + 2, :] for i in range(len(dt) - 1)]).prod()
0
>>> np.array([dt[i : i + 2, :] for i in range(len(dt) - 1)]).prod(axis = 0)
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])
>>> np.array([dt[i : i + 2, :] for i in range(len(dt) - 1)]).prod(axis = 1)
array([[ 0,  0,  0,  0,  0],
       [ 0,  4, 18, 18,  0],
       [ 0, 28, 72, 30,  0],
       [ 0,  0,  0,  0,  0]])
>>> np.array([dt[i : i + 2, :] for i in range(len(dt) - 1)]).prod(axis = 2)
array([[0, 0],
       [0, 0],
       [0, 0],
       [0, 0]])

"axis = 0" takes the elements of the first row of each pair and multiplies corresponding elements to get the first row of the result, and again with the second row of each pair to get the second row of the result (if we think of the pairs of rows as stacked on top of each other to produce a 3-D array, this would be taking products by moving into the page). Likewise, "axis = 2" finds the product of each row in each pair, and returns pairs of results (finding products by moving across the rows). The one we want is "axis = 1", which finds the products by multiplying the elements of the pairs together for each pair. Note that this is NOT matrix multiplication, which is a completely different animal - here, we are simply multiplying corresponding elements. NumPy supports the "cross-product" and the "dot-product", but we don't need those here.

Now that we have a 2-D array of products, we can simply get the max value in the array :

>>> np.array([dt[i : i + 2, :] for i in range(len(dt) - 1)]).prod(axis = 1).max()
72

Of course, this needs to be generalized to handle any size limit, and adjusted to handle columns instead of rows to handle the vertical products. Unfortunately, I've not yet been able to figure out how to reduce the nested list comprehension for the diagonals, nor have I been able to eliminate building an intermediate list - so while this is faster, it still has a lot of overhead :

07

Another approach is to build a large array of the items to be multiplied, and then apply the prod() and max() on the entire thing - rather than calling them for each axis. Unfortunately, this still uses intermediate lists, and concatenates the lists together before converting the lists to an array. Even so, we still get a decent performance improvement - just not better than we have seen with the earlier solutions :

Final Observations

From all of this, it appears a good general approach in Python is to try the obvious solution first - it may turn out to be good enough. Simply replacing lists of lists with NumPy arrays will probably hurt performance, and reworking the solution to use a vectorized approach can be difficult. Unless the solution is readily rephrased in a vector form, the effort to use NumPy for the solution may not be worth the possible performance gain, and may turn out to result in a performance loss.

This is not to say they you should not use NumPy - it has many matrix functions that can be painful to implement by hand, and the ability to to do operations on an entire array, row, column, or even sets of row/columns can be powerful. Just remember that if you are thinking of moving to NumPy purely to improve performance, you will need to be able to take advantage of vectorization against large arrays for this to be a good approach.

⚠️ **GitHub.com Fallback** ⚠️