DataVault - syue99/Lab_control GitHub Wiki

Introduction

Data-Vault server is responsible for storing the experiment data and provides a convenient way to add/modify datasets in the labrad environment. It handles most of our experiment data, including PMT data(usually 30x2) and processed camera data (30x120 array). Note raw camera data is saved locally without going to datavault for saving workload of the system.

Our Datavault should be similar to those of other groups, but we spent some time running the array data pipeline(its real name). Also, we add some h5 compatibility on the data pipeline, BUT we eventually didn't use it and it's on a stage of NOT TESTED.

Running the server

To run the data-vault server, execute python PATH-TO\servers\data_vault\data_vault_tables.py

Basic operations

Basic operations like creating dataset, adding data points, and editing existing datasets is shown in the snippet below

import labrad
#initiate a client program
cxn = labrad.connect()
cxn.data_vault.new('Half-Life', [('x', 'in')], [('y','','in')]) ## creates a new,empty dataset called 'Half-Life-0000n' with parameters x and y. 
cxn.data_vault.add([1,3],[2,4](/syue99/Lab_control/wiki/1,3],[2,4)) ## add data points
cxn.data_vault.open_appendable("00001 - Half-Life") ## open file in append mode
cxn.data_vault.variables() ## list of variables with their units in the form of [(xVariableName,'unit'),('yVariableName','unit')]

There is also the option of specifying parameters, units, among other features. The data created can be viewed in the Real Simple Graphers. See playground page for more details.

DataVault Class

We will briefly talk about the DataVault class structures in order for users to understand how data pipeline flows through Datavault server. For general info regarding the Labrad Server, check pulser server wiki and how to write a server.

Datavault server class inherit from the LabradServer class just as all the other servers. So the Datavault server class is implemented with different functions to perform different data-related tasks. These tasks include change current dirc (cd function), make new dirc (mkdir function), create new datasets (new function), create new matrix datasets (newMatrix function) and etc. All these functions have some explanations when you tried to all the function using a client program. Also if new operations will be implemented in the Datavault class.

Sub-Classes

What makes Datavault different to other servers are the pretty complicated subclasses it implemented in the file. This includes Session, Image, and Dataset classes. All these classes contains a lot of functions that cannot be found unless you dig into the python files and are essential to understand the data-pipeline of Datavault. Here we give a brief introduction to all three classes. For details of each functions you still have to dig into the datavault_tables.py file.

Session Class

Session class inherits from the Object class so it does not have anything to do with the Labrad server. One session object is created for each data directory accessed when a client requested to do so via datavault. The session object manages reading from and writing to the config file, and manages the datasets in this directory by calling methods in Dataset Class. Basically you can view Session as a file explorer UI with a data editor that communicates with the client programs.

Fields

When initialized, a Session obj will contains path,dirc(absolutely path), infofile(session.info file in every data dirc), parent_session(used for parent directory), dataset and some tags and listeners for updating to other servers and clients (e.g. update for add/push data). Datavault will keep a list of active sessions and will load them if they exist. (Kind of like finding the opened file explorers without opening a new one).

Methods

A session obj has quite a lot of methods. I will talk about those that communicate with data-pipelines. These methods include load, save (both to session.ini), newDataset, newMatrixDataset (for small array data), openDataset (for all datasets, returned the dataset) and updateTags (seems to connect with different listeners and handle with add/update data with streaming rather than reloading)

Dataset Class

Dataset class also inherits from the Object class. It is the core of Datavault as this is basically the data structure Datavault handles. In real-life, we always use NumpyDataset instead of Dataset as NumpyDataset use Numpy to deal with data, which provides both a faster speed and more features.

Fields

Class fields include dataset name, session(initiate a new session obj), datafile(actual data file that ends in .csv), inforfile(contains parameters that ends in .ini file), listeners(that used to push updates when new info is received). For the data part, a new dataset is created with empty lists of independents, dependents, parameters, comments, matrixrows, and matrixcolumns. Note dataset can be used to store a list of independent variables and a corresponding dependent variable (e.g. Scanning laser freq and record PMT data) or an array that # of rows and columns is specified beforehand(e.g. Camera data)(I forgot how independent variable is saved, will come back to it). But one dataset cannot have it both.

Methods

Class methods inside the Dataset class includes loading, saving, adding, creating new datasets and are usually called by the methods in Session class in various methods (and these methods are called by the methods in Datavault class). You should check the codes for more details and some explanations in the documentation.

Image Class

Image class is a class that was left blank in the original Datavault server. We think it was there to provide a faster way to handle large image array data (here we mean arrays with thousands of pixels) that Dataset class would be slow to handle. However, it was blank and we have to implemented ourselves. This work is lead by Umang Mishra, or you can contact Jiehang or Fred for reaching out to Umang. Umang added some of the basic features to initialize, create, update images. He also added support for .h5 and .npy for storing them. But it was never really tested as we decide to save the image raw data locally without letting it pass the dataset. Also the streaming function wasn't completed as it should be the key to handle huge image data in a fast manner. And lastly h5 dirc can be confusing so you probably need to dig into the code if we want to implement this further and let datavault handle images data in real time. You can refer to labrad-summer.pdf for our h5 dirc systematic plans and user-end functions involved with the Image class (that are functions implemented in the Datavault server class).