Data format - jackspaceBerkeley/pupil GitHub Wiki
Every time you click record in Pupil's capture software, a new recording is started and you data is saved into a recording folder. It contains:
- world.mkvVideo stream of the world view
- world_timestamps.npy1d array of timestamps for each world video frame.
- pupil_positions.npyarray of pupil position data.
- gaze_positions.npyarray of gaze position data.
- info.csva file with meta data
- pupil_positionspython pickled pupil position data. This is used by Pupil Player.
- additionally a few more files may be saved. More on this later.
These files are stored in a newly created folder inside your_pupil_recordings_dir/your_recording_name/ XXX where XXX is an incrementing number. It will never overwrite previous recordings!
Open a recording in Pupil Player:
Just drag the XXX folder onto the Pupil Player App icon to open! This will open your recording in Pupil Player.
This is how you do the same thing in Terminal:
pupil_player 'path_to_the_XXX_dir'
If you are running from source if would be like this:
cd pupil_src/player
python main.py 'path_to_the_XXX_dir'
The Pupil Recording Data Format
The data format for Pupil recordings is 100% open:
World Video Stream
A mpeg4 compressed video stream of the world view in a .mkv container. The video is compressed using ffmpeg's default settings. It gives a good balance between image quality and files size. It plays on most platforms. The frame rate of this file is set to your capture frame rate.
OpenCV has a capture module that can be used to extract still frames from the video:
import cv2
capture = cv2.VideoCapture("absolute_path_to_video/world.mkv")
status, img1 = capture.read() # extract the first frame
status, img2 = capture.read() # second frame...
Coordinate Systems
We use a normalized coordinate system with the origin 0,0 at the bottom left and 1,1 at the top right.
- Normalized Space
Origin 0,0 at the bottom left and 1,1 at the top right. This is the OpenGL convention and what we find to be an intuitive representation. This is the coordinate system we use most in Pupil. Vectors in this coordinate system are specified by a norm prefix or suffix in their variable name.
- Image Coordinate System
In some rare cases we use the image coordinate system. This is mainly for pixel access of the image arrays. Here one unit is a pixel and the origin is the top left and the bottom right is the maximum x,y.
Pupil Positions
pupil_positions.npy This file is a numpy file format. Coordinates of the pupil center in the eye video are called the pupil position, that has x,y coordinates normalized as described in the coordinate system above.
We store the gaze positions along with some additional information in a numpy array.
| timestamp | confidence | id | pos_x | pos_y | diameter | additional fields | 
|---|
timestamp is the time when the Eye Camera image is received by Pupil_Capture. It is derived from CLOCK_MONOTONIC on Linux and MacOS.  This number is the time in seconds (floating point) since the epoch. Using this information together with the World Video timestamps we can correlate both data streams. The epoch can be set in Pupil Capture.
confidence is an assessment by the pupil detector on how sure we can be on this measurement. A value of 0 indicates no confidence. 1 indicates perfect confidence. In our experience usefull data carries a confidence value greater than ~0.6. A confidence of exactly 0 means that we don't know anything. So you should ignore the position data.
id denounces the data source for this pupil position. In monocular setups it is always 0. For binocular eye trackers this value is 1 or 0 depending on which eye.
pos_x & pos_y are floating point numbers. Following the 'Normalized Space' Coordinate System convention defined above. Pupil positions indicate the detected position of the pupil in the space of the eye video feed! We call this pupil_position.
diameter is the pupil diameter in video camera pixels.
additional fields can be added to the recording data format in the future.
Gaze Positions
gaze_positions.npy This file is a numpy file format. The pupil position get mapped into the world space and thus becomes the gaze position.  This is the current center of the subject visual attention -- or what you're looking at in the world. We store the gaze positions along with some additional information in a numpy array.
| timestamp | confidence | pos_x | pos_y | 
|---|
timestamp is the time when the Eye Camera image is received by Pupil_Capture. It is derived from CLOCK_MONOTONIC on Linux and MacOS.  This number is the time in seconds (floating point) since the epoch. Using this information together with the World Video timestamps we can correlate both data streams. The epoch can be set in Pupil Capture.
confidence a measure from 0 to 1 on how confident we can be about accuracy of the gaze points. This takes into account the confidence of all data sources and processes that lead up to this measurement. (For a monolateral setup this will currently only be the pupil confidence but we will build on this soon.)
pos_x & pos_y are floating point numbers. Following the 'Normalized Space' Coordinate System convention, defined above.  These coordinates are mapped into the world space. This means that they will show you what you were looking at (or what your subject was looking at). We call this gaze position.
Looking at the data
Pupil Player
Head over to Pupil Player to play Pupil recordings and export them is various formats.
##Raw data with Python
You can read and inspect pupil_positions.npy with a couple lines of python code:
import numpy as np
positions = np.load("path_to_data_folder/pupil_positions.npy")
number_of_data_points = positions.shape[0]
Numpy is row-major. In our case, we have an infinite number of rows with 5 columns (one column for each element of data specified above). In order to examine a moment in time, you could look at a single row.
print positions[10]
# Since we are using numpy you could ask for row 0 and row 1
print positions[0:2]
As an example, here is what printing one row of data would look like:
[6.13488305e+03,   1.00000000e+00,   0.00000000e+00,6.16402578e-01,   4.78729661e-01,   6.01729317e+01]
Numpy also makes it easy to examine your data using slices. So, if you only want to examine specific data points you can do so by slicing off a column of data from the array.
# If we want separate arrays of x or y gaze coordinates.
pos_x = positions[:,3]
pos_y = positions[:,4]
# Or if we want pupil_pos as a separate array we could do this
pupil_pos = positions[:,3:5] 
We find numpy to be extremely robust and convenient. But, if you want to get data out of the numpy file and preserve the data structure, you can use numpy's savetxt function.  There are lots of options on how numbers are formatted. Simple example below.
# in this example we will dump the entire positions array into a csv file delimited by commas
positions = np.load("path_to_data_folder/gaze_positions.npy")
# make sure to specify the full path where you want to save the file
np.savetxt("/Full/Path/To/Directory/positions.csv", positions, delimiter=",")
# you could also just save gaze_pos (from above example) like so:
np.savetxt("/Full/Path/To/Directory/gaze_pos.csv", gaze_pos, delimiter=",")
Simple Data visualization
Calculate and display the first derivative of gaze y position using matplotlib:
import matplotlib.pyplot as plt
import numpy as np
pupil_positions = np.load("path_to_data_folder/pupil_positions.npy")
y = pupil_positions[:,4]
dy = y.copy()
dy[1:] = y[1:]-y[0:-1]
dy, = plt.plot(dy)
y, = plt.plot(y)
plt.legend([dy,y],["1st derivative pupil y","pupil y position"])
plt.show()
Synchronisation
The Pupil capture software runs multiple processes. The world video feed and the eye video feed run and record at the frame rates set by their capture devices (cameras). This allows us to be more flexible. Instead of locking everything into one frame rate, we can capture every feed at specifically set rates. But this also means that we sometimes record world video frames with multiple gaze positions (higher eye-frame rate) or without any (no pupil detected or lower eye frame rate).
In player_methods.py you can find a function that takes world and eye timestamped data and correlates the two. The input data for this function is not in the same format as you will find in the Data dir, you will have to convert first!
def correlate_gaze(gaze_list,timestamps):
    '''
    gaze_list: timestamp | confidence | gaze x | gaze y |
    timestamps timestamps to correlate gaze data to
    this takes a gaze positions list and a timestamps list and makes a new list
    with the length of the number of recorded frames.
    Each slot conains a list that will have 0, 1 or more assosiated gaze postions.
    '''
    gaze_list = list(gaze_list)
    timestamps = list(timestamps)
    positions_by_frame = [[] for i in timestamps]
    frame_idx = 0
    try:
        data_point = gaze_list.pop(0)
    except:
        logger.warning("No gaze positons in this recording.")
        return positions_by_frame
    gaze_timestamp = data_point[0]
    while gaze_list:
        # if the current gaze point is before the mean of the current world frame timestamp and the next worldframe timestamp
        try:
            t_between_frames = ( timestamps[frame_idx]+timestamps[frame_idx+1] ) / 2.
        except IndexError:
            break
        if gaze_timestamp <= t_between_frames:
            ts,confidence,x,y, = data_point
            positions_by_frame[frame_idx].append({'norm_gaze':(x,y), 'confidence':confidence, 'timestamp':ts})
            data_point = gaze_list.pop(0)
            gaze_timestamp = data_point[0]
        else:
            frame_idx+=1
    return positions_by_frame