Recording time, cpu, and memory usage while running a program - Green-Biome-Institute/AWS GitHub Wiki
Using bash functions we can write a script that runs a program while in parallel recording information about the system (our EC2 instance). The main reason for this in the context of our work is that many bioinformatics programs require quite a bit of memory. They also can be multithreaded across multiple CPU cores. Since all of the work we do in the cloud must be paid for, we should have a desire to optimize performance in order to pay the least amount possible! To do this, we need to know what is happening with our system while these softwares are running.
For example, we do lots of de novo genome assembly. The length of time and amount of memory an assembly takes is dependent on many factors - these range from biological factors including genome size, sequencing coverage, repetitive content, etc. to non-biological factors like the type of software used (ABySS vs SOAPdenovo2, for example) and how many CPU cores are available. While we cannot get exact measurements on these biological factors, we can do estimations of them. Then, if we do a handful of assemblies while measuring the CPU and memory usage in the background, we can correlate between these and the biological factors and create some general predictions about how much memory or time is required for our next assemblies, aiming to allow us to minimize the EC2 instance size required and therefor the total cost of the analysis.
With that context, let's jump in.
I am using a program called dstat
to output the current CPU and Memory usage along with the current time. This software is no longer being supported so it requires an extra step while setting up:
sudo apt install dstat
Then, edit the following lines (I found this here): Edit /usr/bin/dstat and change lines 547 and 552
You can do this by using (be careful with sudo):
sudo vim ~/../../usr/bin/dstat
Change line 547 to (you can type 547 then press "j" in vim to navigate 547 lines down):
if isinstance(self.val[name], (tuple, list)):
Change line 552 to:
elif isinstance(self.val[name], str):
Done correctly the function "showcsv" will look like:
def showcsv(self):
def printcsv(var):
if var != round(var):
return '%.3f' % var
return '%d' % int(round(var))
line = ''
for i, name in enumerate(self.vars):
if isinstance(self.val[name], (tuple, list)):
#if isinstance(self.val[name], types.ListType) or isinstance(self.val[name], types.TupleType):
for j, val in enumerate(self.val[name]):
line = line + printcsv(val)
if j + 1 != len(self.val[name]):
line = line + char['sep']
elif isinstance(self.val[name], str):
#elif isinstance(self.val[name], types.StringType):
line = line + self.val[name]
else:
line = line + printcsv(self.val[name])
if i + 1 != len(self.vars):
line = line + char['sep']
return line
To check that it worked, enter
dstat
And you will get an output something like:
(base) ubuntu@ip-172-31-60-231:~/APACI$ dstat
You did not select any stats, using -cdngy by default.
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
1 1 96 2 0|1125k 73M| 0 0 | 0 0 | 12k 5455
0 0 100 0 0| 0 0 | 104B 790B| 0 0 | 58 82
0 0 100 0 0| 0 0 | 228B 486B| 0 0 | 29 38
0 0 100 0 0| 0 0 | 52B 342B| 0 0 | 30 38
0 0 100 0 0| 0 0 | 52B 342B| 0 0 | 25 37
0 0 100 0 0| 0 0 | 52B 342B| 0 0 | 59 81
0 0 100 0 0| 0 0 | 52B 342B| 0 0 | 22 34
That will keep going until you force exit the program (CONTROL + C)
To do this, we will need to set up a bash script with two functions. One will be for the software you want to run and one will be for the monitoring. You can also find the below script in the code section of this Github repository.
#!/bin/bash
# This script is meant to run a series (or a single) bash command
# and then to stop or terminate the instance you are running.
# In order to do this, please:
# 1. Make sure this ec2 instance aws cli is configured
# (use `aws configure` and then fill out the information prompted
# 2. Find the EC2 instance ID (you can find this from the EC2 dashboard
# or by entering `ec2metadata --instance-id` into your command prompt)
# and write it in below next to `EC2ID`:
EC2ID="i-00b4c52f923d384a2"
#ex. EC2ID="i-00974d2cc6d562b14"
# 3. If you want to terminate the EC2 instance after your command,
# make `EC2STATE` below equal to 0, if you want it to just stop,
# make it equal to 1.
EC2STATE=1
#ex for terminate:
#EC2STATE=0
#ex for stop:
#EC2STATE=1
# 4. Enter the commands you want to run between the two lines below:
# what should the name of your logs directory be? This will contain the stop and start information.
LOGDIR="logs_aimbr_K133_abyss_t64_m256_"$(date +%d%m%Y)
# code start
echo "Starting your code"
if [ ! -d $LOGDIR ]; then mkdir $LOGDIR; fi
echo $(date) >> $LOGDIR/timestamps.txt
id="_aimbr_k127_abyss_"$(date +%F)
echo true > "flag_"$id.txt
count () {
# your code here:
#############################
abyss-pe name=aimbr-k133-010423 j=62 v=-v k=127 in="GBI28-Aimbr_S83_L001_R2_001_trimmed.fq.gz GBI28-Aimbr_S83_L001_R1_001_trimmed.fq.gz GBI28-Aimbr_S154_L002_R2_001_trimmed.fq.gz GBI28-Aimbr_S154_L002_R1_001_trimmed.fq.gz GBI28-Aimbr_R2_trimmed.fq.gz GBI28-Aimbr_R1_trimmed.fq.gz" | tee -a aimbr-k133-stdout.log
#############################
echo false > $"flag_"$id.txt
}
monitor () {
monitorfile=monitor$id.csv
> $monitorfile
dstfile=dstat$id.csv
t=true
dstat -cmt --out dstat$id.csv 1 1 > buff
cat dstat$id.csv >> $monitorfile
while [ "$t" == true ](/Green-Biome-Institute/AWS/wiki/-"$t"-==-true-); do
dstat -cmt --out dstat$id.csv 1 1 > buff
# cat buff
# cat dstat$id.csv
tail -n 1 dstat$id.csv >> $monitorfile
sleep 1
t=$(cat $"flag_"$id.txt)
if [ "$t" == false ](/Green-Biome-Institute/AWS/wiki/-"$t"-==-false-); then
break
fi
done
rm buff dstat$id.csv "flag_"$id.txt
}
monitor &
count
echo $(date) >> $LOGDIR/timestamps.txt
echo "Finished with your code"
# code end
# These lines line terminates or stops the EC2 instance.
if [ $EC2STATE = 0 ]; then
echo "EC2 instance terminating"
aws ec2 terminate-instances --instance-ids $EC2ID
elif [ $EC2STATE = 1 ]; then
echo "EC2 instance stopping"
aws ec2 stop-instances --instance-ids $EC2ID
fi