python3 - animeshtrivedi/notes GitHub Wiki

Bookmarks

Clean up a directory

import os, shutil
folder = '/path/to/folder'
for filename in os.listdir(folder):
    file_path = os.path.join(folder, filename)
    try:
        if os.path.isfile(file_path) or os.path.islink(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)
    except Exception as e:
        print('Failed to delete %s. Reason: %s' % (file_path, e))

https://stackoverflow.com/questions/185936/how-to-delete-the-contents-of-a-folder

Get the thread-id and the number of active threads

import threading
print('[atr: tid: {}  active_threads: {}'.format(threading.get_ident(), threading.active_count()))

https://docs.python.org/3/library/threading.html#threading.current_thread

Returning multiple values

I had problem in defining the exact signature, but let's leave that for now https://www.geeksforgeeks.org/g-fact-41-multiple-return-values-in-python/

# A Python program to return multiple 
# values from a method using tuple 

# This function returns a tuple 
def fun(): 
	str = "geeksforgeeks"
	x = 20
	return str, x # Return tuple, we could also 
					# write (str, x) 

# Driver code to test above method 
str, x = fun() # Assign returned tuple 
print(str) 
print(x) 

Time in nanoseconds

https://www.geeksforgeeks.org/python-time-time_ns-method/

import time  
time.time_ns()

Implement a ThreadPool in Python

import concurrent.futures
#Define a function that will be executed in the thread pool
def my_function(arg):
    #Perform some long-running operation here
    result = arg * 2
    return result
#Create a thread pool object with 5 worker threads
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    #Submit tasks to the thread pool using the submit() method
    future1 = executor.submit(my_function, 1)
    future2 = executor.submit(my_function, 2)
    future3 = executor.submit(my_function, 3)

    #Use the map() method to apply a function to a list of arguments
    queue = [4, 5, 6]
    results = executor.map(my_function, queue)

    #Wait for all tasks to complete using the shutdown() method
    executor.shutdown()

    #Get the results of each task using the result() method
    result1 = future1.result()
    result2 = future2.result()
    result3 = future3.result()
    #Print the results
    print(result1, result2, result3)
    print(list(results))

Note: The list passed is list of all arguments. Hence, if the function takes (arg1, arg2, arg3), then the list should be [(arg1, arg2, arg3), (arg1, arg2, arg3), (arg1, arg2, arg3), (arg1, arg2, arg3)].

But it does not help because python have this Global Interpreter Lock (GIL). This is optional in Python 3.13 but needs to be explicitly disabled. The standard build on the systems I am using, it does not support it.

https://py-free-threading.github.io/running-gil-disabled/

$ python -VV
Python 3.13.1 experimental free-threading build (main, Dec 10 2024, 14:07:41) [Clang 16.0.0 (clang-1600.0.26.4)]
$ PYTHON_GIL=0 python
# or 
$ python -Xgil=0

Inside the code:

import sys
sys._is_gil_enabled()

Make list of elements between a range

In Python 3, range is an iterator. To convert it to a list:

>>> list(range(11, 17))

https://stackoverflow.com/questions/18265935/how-do-i-create-a-list-with-numbers-between-two-values

Partial Functions in Python

https://www.geeksforgeeks.org/partial-functions-python/

from functools import partial

# A normal function
def f(a, b, c, x):
    return 1000*a + 100*b + 10*c + x

# A partial function that calls f with
# a as 3, b as 1 and c as 4.
g = partial(f, 3, 1, 4)

# Calling g()
print(g(5))

Global variables and access in a local context

https://stackoverflow.com/questions/10814452/how-can-i-access-global-variable-inside-class-in-python

g_c = 0

class TestClass():
    def run(self):
        global g_c
        for i in range(10):
            g_c = 1
            print(g_c)

Print function name

import inspect
this_function_name = inspect.currentframe().f_code.co_name
# or 
__name__
# or 
import inspect

def foo():
   print(inspect.stack()[0][3])
#stack()[0] is the caller
#stack()[3] is the string name of the method

https://stackoverflow.com/questions/251464/how-to-get-a-function-name-as-a-string

Print the Python call stack

import traceback
traceback.print_stack()
traceback.print_stack(file=sys.stdout)
import traceback

def f():
    g()

def g():
    for line in traceback.format_stack():
        print(line.strip())

f()

# Prints:
# File "so-stack.py", line 10, in <module>
#     f()
# File "so-stack.py", line 4, in f
#     g()
# File "so-stack.py", line 7, in g
#     for line in traceback.format_stack():

https://stackoverflow.com/questions/1156023/print-current-call-stack-from-a-method-in-code

When /usr/bin/env: python: No such file or directory

/usr/bin/env: python: No such file or directory

sudo apt install python-is-python3
# or 
sudo ln -s /usr/bin/python3 /usr/bin/python

Get caller info

import inspect 
>>> print(inspect.stack()) 
[FrameInfo(frame=<frame at 0x7f8462b852d0, file '<stdin>', line 1, code <module>>, filename='<stdin>', lineno=1, function='<module>', code_context=None, index=None, positions=Positions(lineno=1, end_lineno=1, col_offset=6, end_col_offset=21))]

https://docs.python.org/3/library/inspect.html#inspect.stack

Python RegEx

https://docs.python.org/3/howto/regex.html

The image below: https://docs.python.org/3/howto/regex.html#performing-matches

image

Always use the 'search' method (typically).

import regex as re 
# a line containing numbers 
re.search("[0-9]+", line)
# a line containing a keyword anywhere 
re.search("blk", line)

Further reference: https://www.cyberciti.biz/faq/grep-regular-expressions/

Set difference methods

https://www.w3schools.com/python/ref_set_difference.asp

unique = set1.keys() - set1.keys()

Check an empty set

return len(c) == 0

Make a dictionary from tuple list

>>> tuple_list = [('a', 1), ('b', 2), ('c', 3)] 
>>> d = dict(tuple_list)
>>> print(d) 
{'a': 1, 'b': 2, 'c': 3}
# or 
>>> h = {k:v for k,v in tuple_list}
>>> print(h) 
{'a': 1, 'b': 2, 'c': 3}
>>> 

Return a default value if key is not present in the dictionary

https://stackoverflow.com/questions/6130768/return-a-default-value-if-a-dictionary-key-is-not-available

# Returns None, if does not exist 
value = d.get(key)
# can provide a default value 
value = d.get(key, "empty")

list get first/last elements

https://sparkbyexamples.com/python/python-get-the-last-n-elements-of-a-list/ https://www.geeksforgeeks.org/how-to-get-first-n-items-from-a-list-in-python/

>>> list = [1, 2, 3, 4, 5, 6, 8, 9 , 10]
# first 3 
>>> list[:3]
[1, 2, 3]
# except the last three 
>>> list[:-3]
[1, 2, 3, 4, 5, 6]
# the last three 
>>> list[-3:]
[8, 9, 10]
>>> 

read file

f = open("demofile.txt", "r")
print(f.read())

or

def read_file_into_buffer(file_path):
   with open(file_path, 'r') as file:
      file_contents = file.read()
   return file_contents

Check kernel version dependency

    import platform 

    print('Uname:', platform.uname())
    print()    
    print('Machine :', platform.machine())
    print('Node :', platform.node())
    print('Processor :', platform.processor())
    print('Release :', platform.release())
    print('System :', platform.system())
    print('Version :', platform.version())
    print('Platform :', platform.platform())
    (major, minor, patch) = platform.release().strip().split('.')[:3]
    if int(major) != 6 or int(minor) != 9:
        term_size = os.get_terminal_size()
        print('#' * int(term_size.columns/2))
        print("WARNING: This code was written for the 6.9 kernel version, so please check the kprobe signatures in case you get bogus values.")
        print('#' * int(term_size.columns/2))

print column-wide pattern

import os 
term_size = os.get_terminal_size()
print('_' * int(term_size.columns/2))

parsing example to match simple pattern

from parse import parse as p
dev_name = p("/dev/{}", ("/dev/loop0").strip(), case_sensitive=True)

May be you will get : ModuleNotFoundError: No module named 'parse'

pip install parse

zip'ed enumerated index on a list

L = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
for (idx, i) in zip(range(0, len(L)), L): 
    print(i, e)

Generate a list of all zeros

https://stackoverflow.com/questions/6076270/lambda-function-in-list-comprehensions https://stackoverflow.com/questions/10712002/create-an-empty-list-with-certain-size-in-python

This inits a list with the sequence of (0, 10) and then applies the lambda function which makes it zero (or any constant that we want)

# Make a null list 
xs = [None] * 10
# set all to zero 
[(lambda x: 0)(x) for x in [None] * 10]
# or an int list then square them 
last_values = [(lambda x: x*x)(x) for x in range(10)]
# o rjust directly create a list of all Zeros ;) like here with 256 
prev = [0] * 256

Formatting strings

https://docs.python.org/3/library/string.html#formatspec

'{:<30}'.format('left aligned')
'left aligned                  '
'{:>30}'.format('right aligned')
'                 right aligned'
'{:^30}'.format('centered')
'           centered           '
'{:*^30}'.format('centered')  # use '*' as a fill char
'***********centered***********'

I think {1} just meant the second argument in the parsing! Positional argument access.

'{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
'{}, {}, {}'.format('a', 'b', 'c')  # 3.1+ only
'a, b, c'
'{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
'{2}, {1}, {0}'.format(*'abc')      # unpacking argument sequence
'c, b, a'
'{0}{1}{0}'.format('abra', 'cad')   # arguments' indices can be repeated
'abracadabra'

What does __name__ == "__main__" do?

Special direct invocation guard

What does if name == "main": do?

Good explanation of subprocess interface of python

https://stackoverflow.com/questions/4256107/running-bash-commands-in-python

cmd = '''while read -r x;
   do ping -c 3 "$x" | grep 'min/avg/max'
   done <hosts.txt'''

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open('hosts.txt') as hosts:
    for host in hosts:
        host = host.rstrip('\n')  # drop newline
        ping = subprocess.run(
             ['ping', '-c', '3', host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split('\n'):
             if 'min/avg/max' in line:
                 print('{}: {}'.format(host, line))

What is drgn

Used in the /tools/workqueue scripts from https://www.kernel.org/doc/html/next/core-api/workqueue.html#examining-configuration and https://github.com/iovisor/bcc/blob/master/tools/wqlat_example.txt

sudo apt-get install python3-drgn 

https://drgn.readthedocs.io/en/latest/ https://github.com/osandov/drgn

drgn was developed at Meta for debugging the Linux kernel (as an alternative to the crash utility), but it can also debug userspace programs written in C. C++ support is in progress.

dict vs. list

See the difference between {} and []

>>> result:dict[str, int] = {}
>>> type(result) 
<class 'dict'>
>>> result:dict[str, int] = []
>>> type(result) 
<class 'list'>
>>> 

C-like structure

https://stackoverflow.com/questions/35988/c-like-structures-in-python

@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0

p = Point(1.5, 2.5)
print(p)  # Point(x=1.5, y=2.5, z=0.0)

immutable:

from typing import NamedTuple

class User(NamedTuple):
    name: str

class MyStruct(NamedTuple):
    foo: str
    bar: int
    baz: list
    qux: User

my_item = MyStruct('foo', 0, ['baz'], User('peter'))
print(my_item) # MyStruct(foo='foo', bar=0, baz=['baz'], qux=User(name='peter'))

Zip returns an iterator, not a list

https://stackoverflow.com/questions/17777219/zip-variable-empty-after-first-use

hence after the first use the iterator is empty.

How to generate timestamps

https://strftime.org/

>>> from datetime import datetime
>>> datetime.today().strftime('%Y-%m-%d')
'2021-01-26'

>>> datetime.today().strftime('%Y-%m-%d %H:%M:%S')
'2021-01-26 16:50:03'

Python string formatting

July 19th

Following the instructions from https://github.com/stonet-research/icpe24_io_scheduler_study_artifact/blob/main/fig-1-samsung-baseline/qd_iops_vary_bs.py then ran into the issues externally managed issue error: externally-managed-environment

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
python3-xyz, where xyz is the package you are trying to
install.

If you wish to install a non-Debian-packaged Python package,
create a virtual environment using python3 -m venv path/to/venv.
Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
sure you have python3-full installed.

If you wish to install a non-Debian packaged Python application,
it may be easiest to use pipx install xyz, which will manage a
virtual environment for you. Make sure you have pipx installed.

See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your 
Python installation or OS distribution provider. You can override 
this, at the risk of breaking your Python installation or OS, 
by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

Solution is to setup the venv and use that, see this. Sequence of commands:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt

sudo access issue https://discuss.python.org/t/using-sudo-inside-venv/29148/2 (take the full path)

(.venv) $ which python3 
/home/atr/src/.venv/bin/python3
(.venv) $ sudo which python3 
/usr/bin/python3
(.venv) $ sudo .venv/bin/python3 test.py -r --fio ~/src/fio/fio --dev /dev/nvme0n1

python-fio workflow

Regarding how to use Python for fio and plotting, I have a couple of general pointers.

Packaging/dependencies

I make use of venv, https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/. This ensures pip3 only installs packages locally. For each Python project I first setup venv within the top-level of the repository:

python3 -m venv .venv
echo .venv >> .gitignore # Should be in the ignore to prevent pushing a vm to git

Whenever, you then want to work on the project, you first type (can be automated with direnv):

source ./venv/bin/activate

You can move your locally installed packages to a dependency file with:

pip3 freeze > requirements.txt

And install them with:

pip3 install -r requirements.txt

All other "suggested" alternatives like poetry and pipenv are slow, made for production software and break every single month.

Mandatory packages required for a good experience

pip3 install matplotlib numpy notebook

Matplotlib is the standard for plotting (https://matplotlib.org/, a good addition is Seaborne) Numpy installs the standard numerics/data structures library used in data science (https://numpy.org/, a friendlier interface on top is https://pandas.pydata.org/). Notebook installs Jupyter (https://docs.jupyter.org/en/latest/)

How to read a json file

import json

with open('fio_data.json') as f:
    d = json.load(f)
    print(d)

How to use matplotlib

The default guide is alright: https://matplotlib.org/stable/users/explain/quick_start.html

How to use notebooks

If notebooks or jupyter lab is installed, you can type in the terminal:

jupyter notebook

It should automatically open your browser.

⚠️ **GitHub.com Fallback** ⚠️