Session 5: Object Oriented Programing 1 - myTeachingURJC/2018-19-PNE GitHub Wiki

Session 5. Week 3: Object Oriented Programming 1

  • Goals:
    • Practicing with pycharm: create folders, adding files, running, debugging. We already know how to do it, but we need to practice it more
    • Practicing with github: Adding new files to the project, commintting and pushing the changes into our github account
    • Learn some concepts from the object oriented paradigm by doing: Classes, objects, methods, inheritance...
    • Learn how to separate programs into different files
  • Duration: 2h
  • Date: Week 3: Tuesday, Feb-5th-2019

Contents

Introduction

All the python programs that you have developed so far use the classical paradigm of the procedural programming. The main idea is to use functions that can be called at any point during the program's execution

The problem you have to solve is divided into smaller parts, implemented using functions, data structures and variables

On the contrary, the object oriented paradigm tries to model the problems by means of defining objects that interact between them. Each object has attributes and methods. The main advantages are the encapsulation and the reusability of the code

Working with sequences

For learning the main concepts we will focus on examples for working with nucleotides sequences. In practice 0 some exercises were proposed about this topic

First, we will study some examples using procedural programing. Then we will implement them using the object oriented paradigm

For modeling a sequence we use strings with the characters A, C, T and G, corresponding to the four bases. Then we define functions for doing the job

Example 1: counting bases

Imagine we want to count the A bases in a sequence, and get the percentage of them over the total number of bases. We model the sequence as a string, we create a function for counting As and calculate the percentage:

def count_a(seq):
    """Counting the number of As in the string"""

    result = 0
    for b in seq:
        if b == 'A':
            result += 1

    return result


# Main program

s = "AGTACACTGGT"
na = count_a(s)
print("The are {} As in the sequence".format(na))

# Calculate the total length
tl = len(s)

print("This sequence is {} bases in length".format(tl))
print("The percentages of As is {}%".format(round(100.0 * na/tl, 1)))

Activity 1: running the example 1

In your 2018-19-PNE pycharm project, create a new folder called session-5. Write the example 1 program into the file ex-1.py and execute it. You should see in the console this result:

There are 3 As in the sequence
This sequence is 11 bases in length
The percentages of As is 27.3%
  • Add this file to your github project. Commit it and push it

Activity 2: Debugging the example 1

Debugging is the word we use in computer science to mean: "make sure the program is working ok by executing it step by step". When we find errors in the code, we call them bugs. So, debugging is the art of finding bugs in your code, and fixing them

  • NOTE: Writing programs is not so dificult. Finding bugs (debugging) can be very very very hard! Is where you will spend most of your time. For that reason, it is very important to write the programs very clearly, structuring very well the data, adding comments, and executing it step by step from the very beginning. This will save you a lot of time and problems

  • Place a breakpoint (red dot) in the first line of the main program

  • Start the debugging process by clicking on the right bug in the top

  • The line to be executed will be highlighted (When it is highlighted it means that it will be executed in the next step)

  • Press the Step over icon (or press the F8 key) for executing only the current line

  • If a new variable has been created in that line, it will appears on the Variables window, inside the Debugger tab. Have a look a it: It will show you the name of the variable, the type and its current value. * The value of the already created variables will also be shown in your code

  • When you want to see what has been printed on the console, just click on the Console Tab. You can change from the console to the Debugger tab whenever you want

  • Execute the example 1 step by step to the end, analyzing all the variables created

Activity 3: Debugging the count_a() function

Notice that during the previous step by step execution of the example 1, we did not step into the count_a() function to see how it works. When we use the step over option, the debugger did not enter into the functions, but rather execute them and show the results

  • Use the same breakpoint than in the activity 3
  • Start debugging the code. Use the Step over option to execute step by step (like before)
  • When the line "na = count_a(s)" is highlighted stop. We want to step into it to see how it works. Click on the step into icon (or press the F7 key)
  • The debugger will enter into the count_a() functions and the first line will be highlighted
  • Check how a new variable appear: result. It will count the number of the As detected. Initially it is set to 0
  • Use the Step over icon to execute step by step.
  • The b variable contains the current base of the sequence, to be analyzed
  • Continue debugging. Before Stepping over, try to determine what the program will do, based on the current value of the variables. And then check it by stepping over it
  • Once the return instruction is executed on the count_a() function, the program will continue its execution on the same line were the function was invoked
  • Continue the step by step execution until the program is finished

The program seems to work ok. Do you think it is bug-free?

Activity 4: Sequence enter by the user

  • Modify the previous program so that the sequence is manually enter by the user. Save this program as ex-1-user.py
  • Run the program for different sequences (For example: AAAAAA, TGTCTC, TTAA...)
  • This is an example of the execution:
Enter the sequence: AA
There are 2 As in the sequence
This sequence is 2 bases in length
The percentages of As is 100.0%
  • Add this file to your github project. Commit it and push it. Then continue with the activity

  • What happens when you enter the null string? (Just press enter when prompted for the sequence). Try it
  • Execute the code step by step until you find the place were the error ocurrs
  • Solve the problem. When the user just press enter, the given result should be the same as if there were no As in the sequence:
Enter the sequence: 
There are 0 As in the sequence
This sequence is 0 bases in length
The percentages of As is 0%

Exercise 1: counting all the bases

  • Modify the program ex-1-user.py so that it counts the 4 bases in the sequence: As, Cs, Ts and Gs
  • Save this program as exercise-1.py, inside the session 5 folder in your project
  • The program must have the function count_bases(seq) that counts all the bases in the sequence seq and returns a dictionary with 4 entries. The keys of the entries are the bases names: 'A', 'T', 'C' and 'G', and the values are the number of that base in the sequence (the counter)
  • The main program should ask the user for the sequence, print the total length, the number of bases and their percentage. This is an example of the output:
Enter the sequence: ACTGG
This sequence is 5 bases in length
Base A
  Counter: 1
  Percentage: 20.0
Base T
  Counter: 1
  Percentage: 20.0
Base C
  Counter: 1
  Percentage: 20.0
Base G
  Counter: 2
  Percentage: 40.0

Exercise 2: Working with two sequences

  • Create program exercise-2.py based on the exercise 1
  • It will ask the user for entering 2 sequences
  • It should print the information for the two sequences: the length, the number of every base and the percentage of every base
  • Implement it by creating a list with the two sequences and printing the results for every sequence on the list. No new funcions are needed. Do in withing the main program
  • Example of the output:
Enter the sequence 1: ACTG
Enter the sequence 2: AACCTTGG

Sequence 1 is 4 bases in length
Base A
  Counter: 1
  Percentage: 25.0
Base T
  Counter: 1
  Percentage: 25.0
Base C
  Counter: 1
  Percentage: 25.0
Base G
  Counter: 1
  Percentage: 25.0

Sequence 2 is 8 bases in length
Base A
  Counter: 2
  Percentage: 25.0
Base T
  Counter: 2
  Percentage: 25.0
Base C
  Counter: 2
  Percentage: 25.0
Base G
  Counter: 2
  Percentage: 25.0

Modelling the sequences with object oriented programming

Let's model the previous sequence problem in a different way: using the object oriented paradigm. It is a way of organizing better our programs. Instead of working with functions and data, we will combine them all together into an object: The sequence object

It is better to think of a sequence as an object with some attributes (like the sequence name, the type of sequence or whatever). The objetcs also have methods, which are actions that they can perform, like counting the number of a base, or calculating its complement sequence

Examples of use

Image that we have already defined an object called Seq (we still do not know how to do it), that represent a sequence of bases. Here there are some examples of how these objects can be used:

# Create an new sequence and store it in the s1 object
s1 = Seq("AGTACACTGGT")

# Create another sequence and store it in the s2 object
s2 = Seq("CGTAAC")

Now we can ask directly the objects for some information, like their lengths:

# Get the sequence length (by asking the object)
l1 = s1.len()   # Hey object 1! give me your length!
l2 = s2.len()   # Hey object 2! give me your length!

# Print the results
print("Total lenght of sequence 1: {}".format(l1))
print("Total length of sequence 2: {}".format(l2))

We can also print the object itself. In this case the sequence of bases is printed:

print("The sequence 1 is: {}".format(s1))
print("The sequence 2 is: {}".format(s2))

Or we can create new sequences from them:

# Calculate the complement sequence of s1
s3 = s1.complement()
print("Sequence 1: {}".format(s1))
print("  Complement sequence: {}".format(s3))

With the object oriented approach, some problems are a way much easier to model than using the procedural approach. Let's learn how can be create and define our own objects

The Classes

A class is the template we use for creating objects. Inside the class we define all the methods of the objects of that class, and we program their behaviour. Let's create a minimum class for working with sequences. We start by defining an empty class:

class Seq:
    """A class for representing sequences"""
    pass

When the class is defined, we can create objects of this class as shown here:

# Main program
# Create an object of the class Seq
s1 = Seq()

# Create another object of the Class Seq
s2 = Seq())

print("Test"
  • Execute this program step by step. Check that there appears two variables, s1 and s2, in the Variables window of the Debuggger tab
s1 = {Seq} <__main__.Seq object at 0x7f5a5077cc18>
s2 = {Seq} <__main__.Seq object at 0x7f5a5075be48>

The init method

The methods are the actions that the objects can perform. The first method we are implementing is the initialization method. It is a special method that is called every time a new object is created. All the methods have the special parameter self as the first parameter

class Seq:
    """A class for representing sequences"""
    def __init__(self):
        print("New sequence created!")


# Main program
# Create an object of the class Seq
s1 = Seq()
s2 = Seq()
print("Testing the sequence objects")

  • Execute this program

When the s1 object is created, the string "New sequence created!" is printed. The same happens when the s2 object is also created. The output of the program is:

New sequence created!
New sequence created!
Testing the sequence objects
  • Execute it step by step, using the step over command

  • Execute it step by step using the step into command every time. Check that the debugger enters into the Class and execute the init method

Adding data: attribute strbases

For representing a sequence we will use a string that is store in every object. We will call this string as strbases. The data stored in the objects is referred as attributes

class Seq:
    """A class for representing sequences"""
    def __init__(self, strbases):
        print("New sequence created!")

        # Initialize the sequence with the value 
        # passed as argument when creating the object
        self.strbases = strbases


# Main program
# Create objects of the class Seq
s1 = Seq("AGTACACTGGT")
s2 = Seq("CGTAAC")

# Access the attribute strbases for printing the sequence
str1 = s1.strbases
str2 = s2.strbases

print("Sequence 1: {}".format(str1))
print("Sequence 2: {}".format(str2))
print("Testing the sequence objects")
  • Execute the program. The output is:
New sequence created!
New sequence created!
Sequence 1: AGTACACTGGT
Sequence 2: CGTAAC
Testing the sequence objects
  • Execute the program step by step and examine the objects s1 and s2. Check that they both have the attribute strbases

Adding methods: len()

Let's add a new method, that we will name as len() for calculating the length of the sequence. As it is a method, the first parameter must be self. For calculating the length, we will use the len() function (because in this example the type we are using for storing the bases is a string)

class Seq:
    """A class for representing sequences"""
    def __init__(self, strbases):
        print("New sequence created!")

        # Initialize the sequence with the value passed
        # as argument when creating the object
        self.strbases = strbases

    def len(self):
        return len(self.strbases)


# Main program
# Create objects of the class Seq
s1 = Seq("AGTACACTGGT")
s2 = Seq("CGTAAC")

# Access the attribute strbases for printing the sequence
str1 = s1.strbases
str2 = s2.strbases

# Invoking the len methods for calculating the sequence length
# Notice the parenthesis: methods always have parenthesis
# but not the attributes!!

l1 = s1.len()
l2 = s2.len()

print("Sequence 1: {}".format(str1))
print("  Length: {}".format(l1))
print("Sequence 2: {}".format(str2))
print("  Length: {}".format(l2))

  • Execute this program. The output is:
New sequence created!
New sequence created!
Sequence 1: AGTACACTGGT
  Length: 11
Sequence 2: CGTAAC
  Length: 6

Inheritance

New classes can be derived from others, reusing their methods and adding new ones. This is called inheritance. Just to present the concepts, Let's create the Gene class derived from the Sequence. It will not add anything

class Seq:
    """A class for representing sequences"""
    def __init__(self, strbases):
        print("New sequence created!")

        # Initialize the sequence with the value passed
        # as argument when creating the object
        self.strbases = strbases

    def len(self):
        # Return the length of the string of bases
        return len(self.strbases)


class Gene(Seq):
    """This class is derived from the Seq Class
       All the objects of class Gene will inheritate
       the methods from the Seq class
    """
    pass


# Main program
# Create a Gene sequence
s1 = Gene("AGTACACTGGT")

# Create a sequence
s2 = Seq("CGTAAC")

# Access the attribute strbases for printing the sequence
str1 = s1.strbases
str2 = s2.strbases

# Invoking the len methods for calculating the sequence length
# Notice the parenthesis: methods always have parenthesis
# but not the attributes!!

l1 = s1.len()
l2 = s2.len()

print("Sequence 1: {}".format(str1))
print("  Length: {}".format(l1))
print("Sequence 2: {}".format(str2))
print("  Length: {}".format(l2))

  • Execute the program. The output is:
New sequence created!
New sequence created!
Sequence 1: AGTACACTGGT
  Length: 11
Sequence 2: CGTAAC
  Length: 6

Notice that even though we have not defined any method in the Class Gene, we can create a Gene in the same way we create a Sequence. This is due to the inheritance. The Gene objects inheritate the methods and attributes from the sequence class (that is called their base class)

Working in separate files: import

Usually we separate the project into different files. Let's practice with the hello world program:

  • Create a new file called Hello.py with a "hello world function"
def hello():
    print("Hello from the Hello module!!.......")

  • Create the file hello_main.py with the main program that invokes the hello() function
from Hello import hello

print("hello main")
hello()
print("End")
  • Pycharm will complaint about the Hello module: It cannot find it. Right click on the session-5 folder and open the option menu mark directory as. Select Sources Root

  • Execute the Hello_main.py program and check that everything is working fine

The same can be done with the classes. Usually the classes are located in different files

Exercise 3: Sequence examples in separated files

  • Move the count_bases() function developed at exercises 1 and 2 into a separated file: Bases.py
  • Move the exercise 1 main program into the file exercise-1-2.py. It should call the count_bases() function from the Bases module
  • Do the same with the example 2 main program: file exercise-2-2.py

Authors

Credits

  • Alvaro del Castillo. He designed and created the original content of this subject. Thanks a lot :-)

License

Links