Profiling the memory usage of a function - dmwm/WMCore GitHub Wiki

I'm writing these notes here before they get lost and I have to gather all this information again.

There are only a few python libraries for memory profiling, however it seems memory_profiler has a wider functionality, allowing you to run your executable while you watch the memory usage, using decorators, profiling your code line-by-line, profiling a whole method/function, etc. Another interesting library to fetch statistics (and also debug memory leaks) of which object types are taking up memory allocation is objgraph. Both are third-party libraries and they are still not in our CMS stack of software, so we have to manually install them. Or, we can use a docker container as a virtual environment and install these packages inside our container only:

pip install memory_profiler     # or ... easy_install memory_profiler
pip install objgraph            # or ... easy_install objgraph

More documentation can be found in: Memory Profiler Objgraph

I took the decorator approach such that we can see at which line the memory usage is blowing up. Unfortunately, we need to update the source code - by adding decorators - to make it work.

I wanted to profile JobSubmitterPoller, so we need to first import memory_profiler in the source code:

+from memory_profiler import profile

and then add the profile decorator to each function that we want to get a memory report back. By default, the report is printed out to sys.stdout, however, if we're running unit tests, we better define a file stream such that we can see that output (only failed unit tests write anything to sys.stdout). Thus, we need to apply more changes to JobSubmitterPoller.py, as following:

...
+    refreshFp = open('refreshCache_stats.log', 'w+')
+    @profile(stream=refreshFp)
     def refreshCache(self):
...

and to the algorithm method that we want to profile as well:

...
+    algorithmFp = open('algorithm_stats.log', 'w+')
     @timeFunction
+    @profile(stream=algorithmFp)
     def algorithm(self, parameters=None):
...

Now we just need to run any unit tests that will call any of these functions, and we'll get those memory log back in the component directory, something like: test/python/WMComponent_t/JobSubmitter_t/algorithm_stats.log

The heaviest JobSubmitter unit test is:

nosetests JobSubmitter_t.py:JobSubmitterTest.testMemoryProfile

A full example can be seen in this branch: memory-profile-test

NOTE: if your test crashes, those file descriptors will be left open until they are eventually closed :(