python_multiprocessing - KU-BIG/KUBIG_Wiki GitHub Wiki
Multiprocessing with Progress Bar
References
- Parallel Processing in Python
- Progress Bars in Python
- Multiprocesing : use tqdm to display a progress bar
Install
$ sudo apt-get install htop
$ pip3 install tqdm
When only one of your cores is working so hard.. just like you in your team
type command below to check how many of your cores are working
$ htop
Works must be distributed to each core
Example
- I'd like to draw scatter plot of every possible combination of columns in the dataframe
- Without parallel processing, only one of the cores will work to draw scatter plot, from first combination to last combination
- Which is desired is, each core works to draw a scatterplot of different combination of columns all together, at the same time
- During this process, I'd also like to know the progress and how much time is remaining
from tqdm import tqdm
import matplotlib.pyplot as plt
# Set combinations of arguments desired to be parallel-processed
args = []
for col1 in df.columns :
for col2 in df.columns :
args.append((col1, col2))
# Define function to process
def save_scatter_matrix(col1, col2) :
plt.figure(figsize=(20, 20))
plt.scatter(df[col1], df[col2])
plt.xlabel(col1)
plt.ylabel(col2)
plt.xticks(rotation=90)
plt.savefig('img/scatter_matrix/%s_%s.png'%(col1, col2))
pbar.update()
return
# Number of cores
num_of_cores = 16
# Assign works to cores
pool = ThreadPool(num_of_cores)
with tqdm(total=len(args)) as pbar:
for i in range(len(args)):
pool.apply_async(save_scatter_matrix, args[i])
pool.close()
pool.join()
Now everyone is unhappy. Just like you :)
With Progress Bar
- with tqdm with proper code, you can check the progress and how much time is remaining to finish the job