Python Tutorial - WheatonCS/Lexos GitHub Wiki
This tutorial introduces some of the more advanced features of Python.
- Type Hinting
- Keyword Parameters
- Private Functions
- Using Named Tuples Instead of Tuples
- Using Objects Instead of Dictionaries
- Unit Testing
- Numpy
- String Manipulation
- List Comprehensions
- Manipulating Lists
- Error Catching
- Temporary functions
- Performance Optimization
- Detecting Character Encoding
Type Systems are one of the best inventions ever, since the beginning of programming languages. They are a set of rules that assigns a type property to various constructs (Ex: variables, functions, modules, etc.) of a computer program.
Unfortunately Python
has one of the weakest type systems out of all the popular languages
(the only one worse than Python is JavaScript
). Both of these programming languages are dynamically typed programming languages, while programming languages such as Java
and C/C++
are typed statically.
Recently Python
developers finally realized that the weakness of the Python type system is hindering the development of large scale projects (Like Lexos). Therefore we use something called Type Hinting.
Here is how you would normally write your functions:
def add(a, b):
return a + b
And this is how you would write your function using type hinting:
def add(a: int, b: int) -> int:
return a + b
In this example the first two int
s represents the variable types of the parameters of the function and the last int
represents the return type of the function.
So why do we need type hinting?
- Type hinting helps greatly with code refactors (restructuring the existing code)
- Type hinting can helps us prevent bugs that are otherwise hard to discover.
Normally, we pass parameters to functions like this:
def add(a, b):
return a + b
add(1, 2)
This kind of passing is an example of using positional parameters. In this case it makes sense to use positional parameters since there are not many parameters that we are dealing with.
But, consider the following function:
def person(
age:int, height_in:int, is_male:bool, is_working:bool,is_handsome:bool, blood_type:str, name: str)
When you see code like this:
person(21, 68, True, False, True, "O", "Cheng")
You don't know what variables these integers, strings and Boolean variables are referring to!
This is a perfect example of an instance of when you should use keyword parameters:
person(
age=21, height_in=68, is_male=True, is_working=False,
is_handsome=True, blood_type="O", name="Cheng")
This looks much better, right? Unfortunately for most cases in our program, the second case (using keyword parameters) is necessary.
Therefore, try to use keyword parameters as much as possible so the code is easy to read!
This is a simple concept, just like C++
needs private functions, Python needs them too.
In Python private functions simply start and end your function with two underscores:
def __example__()
Note: This is not really a private function, but an agreed upon sign to indicate to other coders that this function should not be used unless it is really necessary.
A tuple
is a sequence of immutable Python objects that can hold different data types (like a struct), but must be of a fixed length. We can use tuples instead of classes when we don't need methods.
Below we have a list of tuples:
name_age_list= [("Cheng", 21), ("Rick", 30)]
In order to index a tuple we can write the following code to assign the second tuple in this list to a varible:
rick_info = name_age_list[1]
Instead of using just a tuple though it is best to use a namedtuple
to make the code more readable:
from collections import namedtuple
human = namedtuple("Human",("name","age"))
cheng = human(name="Cheng", age= 21)
And then we can access the named tuple in order to assign it to a variable like this:
name = cheng.name
Note: If you have more than 3 variables to return in a function, the best method to going about this is packing these variables into a tuple!
Along with using named tuples as opposed to tuples, it is also best to use objects
(make classes) instead of using dicts
:
class Human:
def __init__(self,age:int, name:str):
self.__age__ = age
self.__human_name__ = name
@ property
def name(self) -> str:
return self __human_name__
# Create an object
cheng = Human(age= 12, name= "Cheng")
# Access object member
name = cheng.name
In order to ensure that the functions we wrote work correctly for all types of data that the user could input, unit testing
comes in handy. Unit testing
allows us to have a layer of security by testing if functions output the correct results when uploading files/testing Lexos.
For instance, let's say we have this function in our code:
def add(a:int, b:int) -> int:
if a > 1:
return a+b
else:
return a
Now in order to test such a function we can create a new file and do some unit tests such as this one:
def test_add():
assert add(0,0) == 0
assert add(1,2) == 3
Here you see we used assertions to test if the input (seen in the parentheses) gives us the correct output (seen on the right of the '==' sign).
In order to test these assertions, you must go to the top right of PyCharm and click where it says Lexos
. Then click on Edit Configurations...
. In the top left corner of the new window click on the green plus sign, scroll over Python Tests and then click on Unittest
. On the window make sure input a Name, for example: testing. Then hit the Apply
button in the bottom right corner and then the Ok
button.
Now if you press green play triangle called Run
on the top right of PyCharm next to where it says testing
(or whatever name you gave this unit test), it will run your unit test.
Numpy is a popular library on Python that is useful for numberical and scientific computations. Using numpy helps increase the speed of your code significantly as well since it provides a high-performance multidimensional array object and tools that can be used to work with such arrays!
Below you will see how to import numpy and how to make a random array:
import numpy as np
example_matrix = np.random.rand(3,4)
You can then use functions using numpy:
# Sums numbers in each COLUMN
example_matrix.sum(axis=0)
# Sorts ROWS independently of each other
example_matrix.sort(axis=1)
With numpy arrays you can still slice them using the same format, as well as use functions such as the min() and max() functions.
You can also do things such as:
# Adds 1 to every number in the matrix
example_matrix + 1
# Can specify the type in the matrix
np.array([True, False, False], dtype= bool)
You can also use
for element in npArray.flat():
print element
instead of
for row in pythonList:
for element in row:
print element
For more on Numpy click here
Play with the join
and split
functions before you deal with strings. Small changes in the use of these functions can make a significant difference in runtime efficiency.
For example use:
str = ''.join[list]
Instead of:
str = ''
for element in list:
str += element
To create a comma-separated-value (csv) file:
rows = [','.join[row] for row in matrix]
csv = '\n'.join[rows]
tsv = '\t'.join[rows]
Note that in Lexos the DataTables library can produce CSV and TSV files entirely on the client side, but this should only be done when the entire table is held in the DOM (i.e. without server-side processing).
List comprehensions
is a coding syntax that allows for more efficient coding.
Say we have two lists:
a = [1,2,3]
b = []
Before we may have utilized such lists like so:
for ele in a:
if ele > 2:
b.append(ele * 2)
But now, using list comprehension syntax, our code will look like this:
b = [ele * 2 for ele in a if ele > 2]
There are different types of comprehensions too:
list_comp = [ele * 2 for ele in a if ele > 2]
set_comp = {ele * 2 for ele in a if ele > 2}
dict_comp = {key:value * 2 for (key, value) in a if value > 2}
gen_comp = (ele * 2 for ele in a if ele > 2)
Consider using list comprehensions
when dealing with lists.
For example use:
b = [list[i] = list[i][:50] for i in range(len(list))]
Instead of:
for i in range(len(list)):
list[i] = list[i][:50]
A good practice is to catch errors with try
, except
in place of if
, else
clauses. For example use:
try:
dict[i] += 1
except KeyError:
dict[i] = 1
Instead of:
if i in dict:
dict[i] += 1
else:
dict[i] = 1
Use:
try:
os.makedir(path)
except:
pass
Instead of:
if os.path.isdir(path)
pass
else:
os.makedir(path)
When using except
to do complicated jobs, as a general rule, you should specify the error type (KeyError
, ValueError
, etc.) explicitly. A real example of error catching used in unit testing in Lexos is seen below:
try:
_ = a_word_word(split_list=[], keyword="test",
window_size=1)
raise AssertionError("did not throw error")
except AssertionError as error:
assert str(error) == WINDOW_SIZE_LARGE_MESSAGE
A shorter, and even more efficient way to catch errors is to simply use contextlib.ignored
. For example use:
with ignored(OSError):
os.remove('i_probably_do_not_exist')
Instead of:
try:
os.remove('i_probably_do_not_exist.txt')
except OSError:
pass
It can be efficient to use lambda
to create a temporary functions. For instance:
sortedList = sorted(ListofTuples, key=lambda tup: tup[n])
instead of
def sortby(somelist, n):
nlist = [(x[n], x) for x in somelist]
nlist.sort()
return [val for (key, val) in nlist]
sortedList = sortby(ListofTuples, n)
Read this for tips on how to optimize performance.
To time your code, in PyCharm simply use profiler which is the clock with the green arrow over it in the top right corner of the screen.
Currently, Lexos uses the chardet
package to detect character encoding. If you run into problems with character encoding, another Python package that may prove useful is ftfy, which can also be used for correcting improperly encoded Unicode.