2020 10 12 Archive data structures - syntaxmonkey/Thesis GitHub Wiki
A lot of time is spent calling BFF to transform our regions into flattened regions. We can save the region mesh and flattened mesh for later use, thereby reducing future execution time.
Python Pickle
Python has a built in module that can save objects to file. We will utilize this module: https://stackoverflow.com/questions/4529815/saving-an-object-data-persistence.
- Use Pickle to save an Object.
- Can save a List to file.
- Load the Object from file.
- Can load the List object from file.
- Canonicalize filenames.
- SLIC0
- Compactness
- Number of regions
- Semantic Segmentation percentage
- The solution accounts for segment count, compactness, attract threshold, semantic segmentation type, and semantic ratio.
- Integrate with existing code. What is the best location for saving objects to file and loading them back in?
- Complete as possible? This will save the supporting objects as well as the lines.
- This has the advantage of reloading a specific run. It also means that we do not have to regenerate many of the data structures. However, it means that we have to clear a bunch of information if we modify the algorithm.
- Save only the Mesh objects? The BFF flattening takes the longest amount of time. If we save this information and reload that, it rest of the processing should take minimal time. Also, we are also still modifying the algorithm in experimentation. This would allow maximum flexibility.
- Going to implement approach #2. It provides the most flexibility. Aside from BFF flattening, the rest of the algorithm runs relatively quickly.
- Complete as possible? This will save the supporting objects as well as the lines.
Results
Attempting to pickle the meshObj and flattenedMesh failed. The error indicates we cannot pickle the Triangulation object.
TypeError: cannot pickle 'matplotlib._tri.Triangulation' object
-
In the MeshObj class, the self.triangulation is the 'matplotlib._tri.Triangulation'. May need to save the component pieces of the Triangulation instead of the object.
-
May also need to write a custom pickler.
Python Marshal module
Potentially use the Marshal module: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch08s02.html
Results
Receiving message about not able to Marshal.
File "/Users/hengsun/Documents/Thesis/PycharmProjects/geodesic/Bridson_Common.py", line 198, in save_object marshal.dump(obj, output) ValueError: unmarshallable object
Try using Dill
https://github.com/uqfoundation/dill
Testing saving a Triangulation and reloading seems to work. Dill extends Pickle, so it is a drop in solution. Will now test integrate in our code.
Result
Still failed to save. The problem appears to be still related to matplotlib._tri.Triangulation
.
Try to narrow down the problem
Test pickling Triangulation
We need to validate that saving a Triangulation object is possible. Create a simple Triangulation and save.
Result
Successful. We were able to pickle the Triangulation and reload the object for display.
Does something happen to the Triangulation object in our code?
The call stack have several layers.
Bridson_Delaunay.generateDelaunay
It turns out the problem is caused by a couple references in Bridson_Delaunay.generateDelaunay. Attempting to reference the edges and neighbors seem to cause problems in the Triangulation data structure. If we comment out these two statements, the Triangulation saves fine at the end of Bridson_Delaunay.generateDelaunay().
# Bridson_Common.logDebug(__name__, "Edges:", triangulation.edges) # For some reason, this call makes the pickle fail.
# Bridson_Common.logDebug(__name__, "Neighbors:", triangulation.neighbors) # For some reason, this call makes pickle fail.
Bridson_MeshObj.GenMeshFromMask
Creating the trifinder causes a problem for pickler. Commenting out the following line of code resolves part of the problem. We can defer the generation of the trifinder to a later point.
# self.trifinder = self.triangulation.get_trifinder()
Bridson_MeshObj.generateDualGraph
This method makes reference to triangulation.edges
and triangulation.neighbors
. Commenting out this method allows pickling to work.
Can we copy the meshObj and flatMeshObj?
Python has copy module which can perform a deepcopy. Utilizing the deepcopy does indeed allow us to create a copy of the meshObj and flatMeshObj before calling the get_trifinder and generateDualGraph methods.
This is a good strategy for saving the version of the mesh objects for pickling.
We will need to introduce a deferredInitialization to call get_trifinder and generateDualGraph.
Result
Yes, we can save the meshObj and flatMeshObj before we call the genTrifinder and genDualGraph.
When we reload the the meshObj and flatMeshObj, we have to call the genTrifinder and genDualGraph.
Speeding up the code
The results of the pickling the mesh Objects is reasonable. Execution times were reduced from 80 minutes to 45, 54% of the original execution time. However, the code is rather complex. We have many loops and data structures through out the code. There are methods for optimizing execution speed.
- https://wiki.python.org/moin/PythonSpeed/PerformanceTips
- https://www.monitis.com/blog/7-ways-to-improve-your-python-performance/
We will try low hanging fruit first.
Profile code
We can profile the code by using OOB module, or potentially cProfile. The profiling will provide us a baseline of execution speed.
import profile profile.run('main')
Utilizing maps instead of loops
According to #1, we can utilize map function instead of explicit loops. We can potentially map several of our existing methods
1. Numba
Implement jit_module as much as we can.
Result
First attempts at using @njit and @jit have issues. It complains that it does not recognize many of the object types. May need to refactor some of the code so that the compiler works properly.
We were able to implement jit_module for several of the classes. It further dropped the batch execution time from 45 minutes to 30 minutes.