Introduction to Python - acroucher/PyTOUGH GitHub Wiki

This page contains a brief introduction to the Python language. If you already know Python, you can skip it and go straight to the tutorial pages on PyTOUGH itself. If you aren't familiar with Python (even if you have used another programming language before), you should read it before learning how to use PyTOUGH itself.

Contents

What is Python?

Python is a general-purpose programming language. It is free and open-source, and runs on many different computer operating systems (Linux, Windows, Mac OS X and others). Python is easy to learn and very versatile, so it is used for all kinds of different things. This means that if you make a bit of effort to learn it, you can use it not only to make use of PyTOUGH, but to do lots of other useful tasks as well.

How to run Python

The interactive Python environment

The simplest way to run Python interactively is by typing python at the command line (e.g. the DOS prompt on Windows or bash shell on Linux). The command line then becomes an interactive Python environment in which you can type Python commands directly.

Python scripts

The real power of Python, however, lies in using it to write scripts to automate complex or repetitive tasks. To do this, you can just type Python commands into a text file (using a plain text editor) and save the file with a .py file extension. To run the script, you can just type python followed by the name of the file. For example if your script file is called create_grid.py, you can run it by typing python create_grid.py.

Data types in Python

Basic data types

If you have used other programming languages you will know that they all generally provide ways of representing basic things like numbers of integer or floating point kinds, strings of characters, etc. Python also has these basic data types. For example you can create two integer variables a and b, give them values, add them together and display the result like this:

a = 5  
b = 13  
print a + b

In many other languages, you would first have to declare these variables as being integers before it would let you use them. Python doesn't require this. It just figures out what type a variable is according to the value it is assigned. So in the example above, it knows a and b are integers because we have assigned integer values to them. If we had typed a = 6.5 it would have decided that a was a floating point number instead.

In addition Python provides a number of other data types that are very useful and that you may not be familiar with. Some of these are described below.

Lists and dictionaries

A list is a useful data type for storing ordered collections of items. This is a bit like the 'array' data types in other languages, but more flexible. For example, the items in a list don't all have to be of the same type. It is also easy to add or remove items from lists, and Python will not worry that the dimension (i.e. length) of the list has changed. Lists of items can be created by putting square brackets around them, e.g.:

things = [1, 2.0, 'three']

This creates a list variable called things with three items, in this case an integer, a float and a string.

The individual items of a list can be accessed by their index. As with most languages (other than Fortran), list indices start at zero, so the first item of the list above is things[0], the second is things[1], etc. In Python you can also use negative indices to access items from the end of the list. So the last element of the list above could be accessed by things[-1], the second to last by things[-2], and so on.

Lists can be 'added' together to form longer lists, using the + operator. For example, [1, 2.0, 'three'] + ['six', 4, 3.0] gives the result [1, 2.0, 'three','six', 4, 3.0]. You can also use the * operator to add copies of lists together, so [1, 2] * 3 gives [1, 2, 1, 2, 1, 2].

A dictionary is a data type used for storing collections of items that we wish to refer to by name rather than index. Since the items are not referred to by index, they are not ordered. Like lists, dictionaries can contain items of different types. A dictionary can be created using curly brackets {} instead of square ones. For example, we can create a simple phone-book dictionary as follows:

phone={'Eric' : 8155, 'Fred' : 2350, 'Wilma' : 4667}

which stores phone numbers for three people. We can then access individual items in the dictionary using square brackets. For example, phone['Fred'] would return the value 2350. We could also subsequently add a new item for someone else like this:

phone['Gabriel'] = 4100

You can delete items from a dictionary using the del command:

del phone['Eric']

Dictionaries are used frequently in PyTOUGH to store collections of items like TOUGH2 grid blocks, connections or generators, which we would like to be able to access by name.

###Other data types

There are two other data types for storing collections of items that you may come across in PyTOUGH.

The tuple is very similar to a list. The main difference is that once items have been added to a tuple, they cannot be changed. Tuples are created using round brackets instead of square ones. Their items can be accessed by index using square brackets, exactly as for a list. So for example, stuff = ('watch', 2, 3.5) creates a tuple with three items in it. We can access for example its second item via stuff[1], but we cannot change its items or add or delete them. One useful thing that tuples can be used for is to name items in a dictionary using two names instead of one. For example, in PyTOUGH this is used to access a dictionary of connections between pairs of TOUGH2 grid blocks, using two-element tuples of block names.

Python also has a set data type for representing mathematical sets- simple unordered collections of items. One of their useful properties is that sets cannot contain duplicate items. So, for example, we can remove duplicate items from a list x simply by converting it to a set and then back to a list: x = list(set(x)).

Customized data types: classes and objects

You can do a lot using just the data types described above. In the old days, people had to write all their programs using only simple data types, because that's all there was. TOUGH2 itself was originally written mostly using only integers, floating point numbers, strings and arrays of these.

However, more recently people realized that writing complex programs would be simplified if we could effectively define our own data types, customized for the task at hand. For example, for manipulating TOUGH2 grids it would be useful to have a special data type for storing all the information about a TOUGH2 grid block, including its name, volume, rock type etc., in a single variable.

Most modern programming languages like Python allow you to do define your own classes, which can be thought of effectively as customized data types. Python takes what is called an 'object-oriented' approach, which means that all variables in Python are thought of as objects which can be of simple standard data types like integers or strings, or instances of your own special classes.

The properties of an object

Defining a class allows us to encapsulate all kinds of different pieces of information about something in one object. These pieces of information are known as the properties of an object. In Python, the properties of any object are accessed using a dot after the object's name, followed by the name of the property or method we want to use.

For example, a TOUGH2 grid contains collections of rock types, grid blocks, connections and generators (and some other information like run-time parameters). We can define a class to represent individual grid blocks, another for rock types, and so on. If we have a variable called blk which is an instance of our TOUGH2 grid block class, we can access its volume using blk.volume, its name using blk.name, etc. This is because our class definition specified that grid blocks have the properties name and volume.

Our object's properties can themselves be instances of other classes. For example, our block class could have a rocktype property, which was an instance of our rock type class. If the rock type class had a permeability property, we could then access the permeability in the block simply via blk.rocktype.permeability. If you see strings of names with dots in between them in your scripts, that is because there are classes nested inside each other in this way.

The methods of an object

The object-oriented approach not only allows us to package together all the different properties of an object, but takes things one stage further, and allows our classes to encapsulate the object's behaviour as well. So, our class can also define the things our objects can do, which are known as its methods.

Continuing with the example above, we can also define a class to represent a whole TOUGH2 grid, containing dictionaries of blocks, connections and so on. Besides these properties, our class might also define some methods; for example a method called check telling the grid how to check itself for errors. If we then had a grid called grd we could get it to check itself via grd.check().

As you can see, an object's methods are accessed using a dot after the object's name, just like its properties. In addition, the method has brackets after it. Methods can have parameters passed to them inside the brackets. In the example above, there aren't any parameters inside the brackets, so the check method might either not expect any parameters, or it might have optional parameters that have been omitted. Python method parameters can be either required, in which case they must be included when you call the method, or optional, in which case they can be left out. Any that are left out are given default values, specified in the class definition.

Loops

Most languages provide means for looping over collections of items (e.g. elements of an array) using an integer index. Python generalizes this by allowing loops over the items of any list, whether it contains integers or other objects.

A list of consecutive integers can be created in Python using the range command. For example, indices = range(10) will create a list of the ten integers from 0 to 9. So we could print these numbers using:

for i in range(10):
    print i

However, we are usually not that interested in lists of integers. Loops of this sort are often used (in other languages) to iterate over collections of other types of objects. But Python allows us to loop over them directly, without using an integer index. For example if we have an arbitrary list things with any number of items in it, we can print them using:

for thing in things:
    print thing

If we are in fact interested in the index of the list item as well as the item itself, we can use the enumerate command, which creates a new list of pairs (tuples) of indices and items from the original list. Hence, for example, we can use:

for i, thing in enumerate(things):
    print i, thing

to print a list of indices with their corresponding list items next to them.

Whitespace

In the examples of loops in the previous section, you may have noticed that the loops don't have any statement at the end to indicate where the loop ends. Other languages usually have a special statement (e.g. 'end do' in Fortran) to do this, or they use braces to indicate the start and end of loops.

Python does this differently. Loops are just indicated by the indenting of the code from the left margin. So the end of the loop is indicated by the point at which the code isn't indented anymore. (Even in other languages that use special statements to indicate the end of a loop, it's a good idea to indent the code in the loop, to make it easier to read.)

Conditionals

Conditionals (or 'if statements') are handled in Python in much the same way as in other languages. The if keyword is just followed by the condition, which is followed by a colon at the end of the line, for example:

if block.name == 'AR214':
    block.rocktype = rock1
    n += 1
elif block.name == 'AR314':
    block.rocktype = rock2
else:
    block.rocktype = rock3
print 'n =', n

Note that the statements in the if block are indented, just as they are in a loop. The print statement at the end is not part of the conditional, as it isn't indented.

Also note that to compare two objects to see if they are equal, you need to use a double equals sign (==). A single equals sign means you are making one object equal to another one (as in the second line of the code above). On the third line, you can see another operator, +=. Writing n += 1 is just a shorthand way of saying n = n + 1. You can do similar things with the -=, *= and /= operators.

The code above also illustrates how to use elif (short for 'else if') and else tests in your conditionals.

List comprehensions

Some of the things that we would need a loop to do in other languages can be done very neatly in Python without a loop, but using a one-liner called a list comprehension instead. This is just a special way of constructing a list.

For example, suppose that (for some reason) we wanted to know the squares of the even numbers from 0 to 9. We could write a loop that iterated from 0 to 9, tested if the number was even, and if so, printed its square or added it to a list. In Python, however, we can create a list with the required result using a single statement:

[i*i for i in range(10) if i%2 == 0]

You can see this 'list comprehension' has a for part which iterates from 0 to 9 (10 numbers in all), just as in a loop. After that, it also contains a conditional which tests if the number is even (the % is the 'modulo' operator, or remainder after integer division, so if i%2 == 0 tests if i is divisible by 2, i.e. even). Finally, the first part of the comprehension says what to return from each number in the loop, in this case its square. So the result is:

[0, 4, 16, 36, 64]

Functions

We can define functions or subroutines (for pieces of code that we want to call repeatedly) using the def statment, followed by the name of the function, and in brackets after it, any arguments that we want to pass into the function. For example:

def f(x, a = 0.0):
    dx = x - a
    return dx*dx + 2 * dx - 1

defines a function f of the argument x and returns a value. Besides the required argument x, there is an optional one a which is given the default value of zero if not specified. In other words, b = f(x) and b = f(x, 0.0) would both return the same value.

Code inside functions is again indented from the left margin, just as in loops and conditionals.

The example function above returned a value, but this need not be so. You can define functions that don't return any value, just by not including a return statement in them. Then they act like a 'subroutine' in Fortran.

Comments

In Python you can add comments to your code by using the hash # symbol. Anything on that line after the hash will be treated as a comment.

Python libraries

Importing libraries

As well as basic commands like the the ones we have introduced above, Python contains a range of libraries of more specialized commands (and class definitions). For example, to get access to mathematical functions (e.g. trigonometric functions) you can import the math library.

The simplest way to import a library in Python is by typing import followed by the name of the library. You can do this anywhere in your code, as long as it comes before any statements that use commands in that library. So for example you can import the math library simply using

import math

After that, for example, you can access the sine function using math.sin. If you import a library in this way, you need to put the name of the library before any of the commands in it. That may look a bit cumbersome, but it has the advantage that if you have a particular command present in two different libraries, they can't get mixed up because they will have different prefixes.

If you don't like the name of the library, you can rename it as something else as you import it (for example, if it's a long name and you can't be bothered typing it in all the time). For example, if you imported the math library using

import math as m

you could then type m.sin to access the sine function.

You can also import only the commands in the library that you need, rather than all of them. So for example if we only wanted the sin and cos functions from the math library, we can import just them by typing

from math import sin, cos

We could then use these functions directly without having to prefix them with the name of the math library.

Finally, if we do want to import everything from a library, but don't want to have to put the name of the library in front of every command in it, we can again use the from command with the wildcard *. For example:

from math import *

would import everything from the math library (like import math) but not require us to prefix math in front of any mathematical functions we wanted to use.

Additional libraries

Python is used for a huge range of different programming tasks, and people have written a lot of their own additional Python libraries for special purposes. Many of these are freely available (like Python itself). If you have these additional libraries installed on your computer, you can import them and then use them just like any other library.

PyTOUGH is one of these additional Python libraries. It contains a number of different sub-libraries for handling different aspects of TOUGH2 simulations, and these can be imported into your scripts as you need them.

PyTOUGH makes use of several other additional libraries. The most important of these is the Numerical Python library (see below).

The Numerical Python library

PyTOUGH makes a lot of use of the Numerical Python library ('numpy' for short). This is a library for carrying out efficient numerical computation in Python.

If you import any of the PyTOUGH libraries, PyTOUGH will have imported Numerical Python already, so you don't need to import it yourself in order to use it. PyTOUGH always imports it using the command import numpy as np, so you have access to anything in Numerical Python by prefixing it with np.

Numerical Python introduces a special new data type for arrays, the np.array. Like arrays in other languages like Fortran, the length of an array must be defined before you can use it, and the elements of an array are (usually) all of the same type. This means that arrays are less flexible than lists, but more efficient for numerical computations.

You can create an array a from a list of numbers b using a = np.array(b). It's also possible to read arrays directly from text files. Once you have your data in arrays, you have access to a large range of numerical machinery in Numerical Python, e.g. linear algebra routines, Fourier transforms and statistical calculations. These can be useful particularly for analysis of results from your simulations.

Further information

Here we have covered only the basics of Python- enough to get you started with PyTOUGH.

More in-depth information about Python can be found on the Python website. In particular, the Python language reference describes how the language works, and the standard library details all the commands that standard Python includes.

There is also a wikibook on Python programming here.

A web search will turn up loads more information on Python if you need it. If you have a specific problem to solve in Python, searching for 'python' and some keywords relevant to the problem is often a good way to find out how other people have solved it.

Now go and do a Python tutorial

Before you try to do anything complicated with PyTOUGH, it is highly recommended that you do one of the many Python tutorials available online, e.g.

It is tempting to skip this sort of thing, but if you try to jump in and start doing complex things without a good grounding in the Python language, at some point you will probably be get frustrated because your script doesn't do what you thought it would do. A bit of time invested in learning Python now might save you a lot of time later.

Back to Contents page