Python - JasonLocklin/jasonlocklin.github.com GitHub Wiki

Intro to Programming

Python Resources

  • Getting started in Python Getting started page.

  • Official Documentation: Python's standard documentation is substantial. See also the complete list of documentation by Python version. If you're not sure where to go, try the help page.

  • Dive into Python: a book available for free online that teaches the "Python" way of tackling typical programming tasks.

  • Intro Video This lecture is a fast-paced introduction to Python. It assumes that viewers have some previous experience of programming, and know at least a little about loops, lists, if statements, functions, and file I/O.

  • Mathesaurus Thesaurus-like references for those transitioning between R, Python(numpy), Matlab/Octave, and scilab.

  • Python for Data Analysis." The main documentation is here

Python in the Literature

Useful Packages

Python it's self is rather minimal. It is the packages that extend the language and make it useful for so many things. Keeping track of important packages, where to find them, and what they are good for can be tricky. Here are some of the most useful ones:

Data Analysis

  • IPython -extends the python shell to turn it into a more powerful interactive analysis tool. Now includes a "Notebook interface" that provides a great way of writing a self-documenting analysis right in your web browser, effectively replacing the need for a text editor, or IDE, a console, and streamlining everything.
  • Numpy -Adds basic numerical programming to Python. Arrays, matrices, that sort of thing.
  • Scipy -Extends Numpy to do things like linear algebra, statistics, and other higher level maths.
  • Pandas -Adds a convenient new data structure (dataframe) that is convenient for holding data, and some methods to work with them, including some stats and summary functions.
  • Matplotlib -adds plotting functionality. If using iPython, importing pylab brings in matplotlib tools automatically.
  • Seaborn -turns the basic (matlab ugly) plots produced by matplotlib into aesthetically pleasing figures for publication.
  • Rpi -basic interface for calling R commands or scripts.
  • Spyder -a Python "integrated development environment" designed explicitly for the purpose of using the packages above.
  • NeuroImaging in Python (a group of packages for brain imaging research)

Also see my R Python Cheatsheet

More info about Pandas:

Pandas is very powerful, but suffers from "bleeding edge syndrome." The documentation is difficult to follow, and there are often various ways of doing things that don't seem to follow a consistent design.

  • Normally imported with import pandas as pd
  • Debian testing has up-to-date versions of it.
  • read_csv can directly read compressed csv files (neat). I use bz2.
  • data frame variables can be accessed with df.variable or df['variable']. Like R, the shorthand version should not ever be used for writing to data frames. I avoid using the short hand all-together outside of the interactive console.
  • Unlike R, Pandas does things in an object oriented way. I.e., data.describe() rather than summary(data).
  • Be careful writing to objects as some functions work on copies of data, while others do not, and it's far from obvious which do which. The documentation isn't perfectly clear, so always test your commands that they are doing what you intend.
  • Hierarchical indexing is very cool, but takes some practice.

Psychology Experiments

  • Psychopy (Environment for creating psychology experiments)
  • VisionEgg (Python Library for psychophysics experiments)
  • Pylink (Eyelink II Python Interface)

Tips and Tricks

Memory leaks with fast loops.

I have observed several people run into issues with fast looping psychology experiments. Two things generally cause this. One is assignment in the loop. If you make an assignment, say a=1, Python allocates memory, assigns it a value of 1, and creates a pointer called a that points to it. If you do the same assignment a second time, python allocates memory again, assigns it a value of 1, and changes the a pointer to point there. Since nothing is pointing to the old memory location, it is marked for "garbage collection" and Python de-allocates it when it has some spare time. If you are doing this in a fast loop, you could imagine that those assignments can add up. This is especially true in a fast loop where Python doesn't have time to ever do it's garbage collection. If you are doing something like loading stimuli from files, or creating them, do so outside of a loop, and simply turn them on or off in the loop. If you must do calculations that involve assignments inside a loop, you may need to add a wait function of a few milliseconds to free up some CPU time for garbage collection.