Python Tutorial - WheatonCS/Lexos GitHub Wiki

This tutorial introduces some of the more advanced features of Python.

Type Hinting

Type Systems are one of the best inventions ever, since the beginning of programming languages. They are a set of rules that assigns a type property to various constructs (Ex: variables, functions, modules, etc.) of a computer program.

Unfortunately Python has one of the weakest type systems out of all the popular languages (the only one worse than Python is JavaScript). Both of these programming languages are dynamically typed programming languages, while programming languages such as Java and C/C++ are typed statically.

Recently Python developers finally realized that the weakness of the Python type system is hindering the development of large scale projects (Like Lexos). Therefore we use something called Type Hinting.

Here is how you would normally write your functions:

def add(a, b):
    return a + b

And this is how you would write your function using type hinting:

def add(a: int, b: int) -> int:
    return a + b

In this example the first two ints represents the variable types of the parameters of the function and the last int represents the return type of the function.

So why do we need type hinting?

  • Type hinting helps greatly with code refactors (restructuring the existing code)
  • Type hinting can helps us prevent bugs that are otherwise hard to discover.

Keyword Parameters

Normally, we pass parameters to functions like this:

def add(a, b):
    return a + b

add(1, 2)

This kind of passing is an example of using positional parameters. In this case it makes sense to use positional parameters since there are not many parameters that we are dealing with.

But, consider the following function:

def person(
      age:int, height_in:int, is_male:bool, is_working:bool,is_handsome:bool, blood_type:str, name: str)

When you see code like this:

person(21, 68, True, False, True, "O", "Cheng")

You don't know what variables these integers, strings and Boolean variables are referring to!

This is a perfect example of an instance of when you should use keyword parameters:

person(
      age=21, height_in=68, is_male=True, is_working=False,
      is_handsome=True, blood_type="O", name="Cheng")

This looks much better, right? Unfortunately for most cases in our program, the second case (using keyword parameters) is necessary.

Therefore, try to use keyword parameters as much as possible so the code is easy to read!

Private Functions

This is a simple concept, just like C++ needs private functions, Python needs them too.

In Python private functions simply start and end your function with two underscores:

def __example__()

Note: This is not really a private function, but an agreed upon sign to indicate to other coders that this function should not be used unless it is really necessary.

Using Named Tuples Instead of Tuples

A tuple is a sequence of immutable Python objects that can hold different data types (like a struct), but must be of a fixed length. We can use tuples instead of classes when we don't need methods.

Below we have a list of tuples:

name_age_list= [("Cheng", 21), ("Rick", 30)]

In order to index a tuple we can write the following code to assign the second tuple in this list to a varible:

rick_info = name_age_list[1]

Instead of using just a tuple though it is best to use a namedtuple to make the code more readable:

from collections import namedtuple
human = namedtuple("Human",("name","age"))
cheng = human(name="Cheng", age= 21)

And then we can access the named tuple in order to assign it to a variable like this:

name = cheng.name

Note: If you have more than 3 variables to return in a function, the best method to going about this is packing these variables into a tuple!

Using Objects Instead of Dictionaries

Along with using named tuples as opposed to tuples, it is also best to use objects (make classes) instead of using dicts:

class Human:
    def __init__(self,age:int, name:str):
        self.__age__ = age
        self.__human_name__ = name
    
    @ property
    def name(self) -> str:
        return self __human_name__

# Create an object
cheng = Human(age= 12, name= "Cheng")

# Access object member
name = cheng.name

Unit Testing

In order to ensure that the functions we wrote work correctly for all types of data that the user could input, unit testing comes in handy. Unit testing allows us to have a layer of security by testing if functions output the correct results when uploading files/testing Lexos.

For instance, let's say we have this function in our code:

def add(a:int, b:int) -> int:
    if a > 1:
        return a+b
    else: 
        return a

Now in order to test such a function we can create a new file and do some unit tests such as this one:

def test_add():
    assert add(0,0) == 0
    assert add(1,2) == 3

Here you see we used assertions to test if the input (seen in the parentheses) gives us the correct output (seen on the right of the '==' sign).

How to Run Your Unit Tests Using PyCharm

In order to test these assertions, you must go to the top right of PyCharm and click where it says Lexos. Then click on Edit Configurations.... In the top left corner of the new window click on the green plus sign, scroll over Python Tests and then click on Unittest. On the window make sure input a Name, for example: testing. Then hit the Apply button in the bottom right corner and then the Ok button.

Now if you press green play triangle called Run on the top right of PyCharm next to where it says testing (or whatever name you gave this unit test), it will run your unit test.

Numpy

Numpy is a popular library on Python that is useful for numberical and scientific computations. Using numpy helps increase the speed of your code significantly as well since it provides a high-performance multidimensional array object and tools that can be used to work with such arrays!

Below you will see how to import numpy and how to make a random array:

import numpy as np
example_matrix = np.random.rand(3,4)

You can then use functions using numpy:

# Sums numbers in each COLUMN
example_matrix.sum(axis=0)  
# Sorts ROWS independently of each other
example_matrix.sort(axis=1)  

With numpy arrays you can still slice them using the same format, as well as use functions such as the min() and max() functions.

You can also do things such as:

# Adds 1 to every number in the matrix
example_matrix + 1  
# Can specify the type in the matrix
np.array([True, False, False], dtype= bool)  

You can also use

for element in npArray.flat():
  print element

instead of

for row in pythonList:
  for element in row:
      print element

For more on Numpy click here

String Manipulation

Play with the join and split functions before you deal with strings. Small changes in the use of these functions can make a significant difference in runtime efficiency.

For example use:

str = ''.join[list]

Instead of:

str = ''
for element in list:
    str += element

To create a comma-separated-value (csv) file:

rows = [','.join[row] for row in matrix]
csv = '\n'.join[rows]
tsv = '\t'.join[rows]

Note that in Lexos the DataTables library can produce CSV and TSV files entirely on the client side, but this should only be done when the entire table is held in the DOM (i.e. without server-side processing).

List Comprehensions

List comprehensions is a coding syntax that allows for more efficient coding.

Say we have two lists:

a = [1,2,3]
b = []

Before we may have utilized such lists like so:

for ele in a:
    if ele > 2:
      b.append(ele * 2)

But now, using list comprehension syntax, our code will look like this:

b = [ele * 2 for ele in a if ele > 2]

There are different types of comprehensions too:

list_comp = [ele * 2 for ele in a if ele > 2]
set_comp = {ele * 2 for ele in a if ele > 2}
dict_comp = {key:value * 2 for (key, value) in a if value > 2}
gen_comp = (ele * 2 for ele in a if ele > 2)

Manipulating Lists

Consider using list comprehensions when dealing with lists.

For example use:

b = [list[i] = list[i][:50] for i in range(len(list))]

Instead of:

for i in range(len(list)):
  list[i] = list[i][:50]

Error Catching

A good practice is to catch errors with try, except in place of if, else clauses. For example use:

try:
  dict[i] += 1
except KeyError:
  dict[i] = 1

Instead of:

if i in dict:
  dict[i] += 1
else:
  dict[i] = 1

Use:

try:
  os.makedir(path)
except:
  pass

Instead of:

if os.path.isdir(path)
  pass
else:
  os.makedir(path)

When using except to do complicated jobs, as a general rule, you should specify the error type (KeyError, ValueError, etc.) explicitly. A real example of error catching used in unit testing in Lexos is seen below:

try:
    _ = a_word_word(split_list=[], keyword="test",
                     window_size=1)
    raise AssertionError("did not throw error")
except AssertionError as error:
    assert str(error) == WINDOW_SIZE_LARGE_MESSAGE

A shorter, and even more efficient way to catch errors is to simply use contextlib.ignored. For example use:

with ignored(OSError):
    os.remove('i_probably_do_not_exist')

Instead of:

try:
    os.remove('i_probably_do_not_exist.txt')
except OSError:
    pass

Temporary functions

It can be efficient to use lambda to create a temporary functions. For instance:

sortedList = sorted(ListofTuples, key=lambda tup: tup[n])

instead of

def sortby(somelist, n):
  nlist = [(x[n], x) for x in somelist]
  nlist.sort()
  return [val for (key, val) in nlist]
sortedList = sortby(ListofTuples, n)

Performance Optimization

Read this for tips on how to optimize performance.

To time your code, in PyCharm simply use profiler which is the clock with the green arrow over it in the top right corner of the screen.

Detecting Character Encoding

Currently, Lexos uses the chardet package to detect character encoding. If you run into problems with character encoding, another Python package that may prove useful is ftfy, which can also be used for correcting improperly encoded Unicode.

⚠️ **GitHub.com Fallback** ⚠️