ESP Dev ‐ Python - GTAC-MGI/GTAC-ESP-LIMS GitHub Wiki
docker-compose up
docker-compose exec server bash
- ansible playbook:
roles/container.yml
-
docker compose build
to pull down the latest image (must be logged in to docker hub) - install ESP client locally:
pip install --extra-index-url http://localhost:11702/pypi/simple espclient
you'll need to set esp.options
in order to access resources successfully
esp.options(username="admin@localhost", password="*****")
- Start Docker container outside of VS Code as normal
- Log in to the container and run
make install
- Activate the ESP Python environment (if not done automatically):
source ~/data/extensions/client/bin/activate
- Open VS Code
- From the blue icon in the bottom left or from the command palette, run 'open in container'
- If VS Code can't find the workspace, choose
opt/l7esp/data/project
- (Re-)install the Python VS Code extension inside the container
- Run the debugger
You might have to manually browse to find the client Python interpreter for ESP in VS Code. Something like:
/opt/l7esp/data/extensions/client/bin/python
or run:
(client) /opt/l7esp/data/extensions/client/bin/python /opt/l7esp/data/content/pipelines/scripts/cle_assign_seq_pool.py
- https://code.visualstudio.com/docs/remote/containers
- https://code.visualstudio.com/docs/editor/debugging#_launch-configurations
- need to export pipelines, workflows, entities, etc (things built in the Builder UI) as YAML and added to the repo - as well as custom configurations (JSON)!
- use
esp export --shallow content-type "Content Name"
- after exporting a new yaml file, need to add a new entry to content.yaml that references the new file.
- if you are exporting on the Prod server, the server has a different password for your
admin@localhost
account, you need to specify it like so:esp -p MySuperSecretPassword export YourModelType YourModelName
. - When you are able to reuse a protocol,
esp --port <port> export --shallow
comes in handy for workflows. (Protocols referenced by name only in the workflow. Protocol defined in its own YML. Will need remember to add a seed for the referenced protocol though otherwise the reference will throw an error if it doesn't exist in your database.) -
esp dump --help
- can create a file structure for you based on a config file, produces a seed file. good when starting out a new ESP build. example:esp dump --model protocol
dumps out all protocols. - Export only with
—shallow
option, except for pipelines - to attach corresponding tasks to - Be mindful of spacing and indentation when copying and pasting YAML content!
- The export only provides one tab, two spaces, but it requires two tabs, four spaces, when using a '-' to start the name. If you were to just export the item, copy it into a new file, then import that file, it will work as-is.
- If you add a dash for the name so that it matches other items into a .yml file, you will need to add the extra spaces.
- config
- container_type
- container
- item_type
- item
- vendor
- sample_type
- sample
- task
- pipeline
- protocol
- workflow
- workflow_chain
- applet
- report
- workflowable_class
- user
- role
- workgroup
esp import protocol content/protocols/seq_receive_library_pools_protocol.yml
You can export individual instances of items like so:
esp export Sample PL000003
During development, it's useful to have real-time updates of changes you make to content config files. This is possible with the watch
entrypoint. To use the watch
entrypoint, include watch
before your seed
or import
command:
~$ # with seed
~$ python -m esp watch seed /path/to/content.yml
~$ # with import
~$ python -m esp watch import workflow /path/to/workflow.yml
This will monitor the directory containing that seed file for changes and will run the command any time a change occurs. Notably, this helps shorten the develop/test cycle for content in the SDK, where make import
isn't necessary to re-seed an instance with content. With this the watch
entrypoint will watch for changes made by developers and update their esp instance in real-time.
You can also include additional directories into the search path for the watch
entrypoint. For example, to watch a content directory in the SDK and run seed
each time a change is made in that directory, you can use:
~$ python -m esp watch --include=./content seed ./roles/content.yml
This makes the process of iterating on multi-part configuration for an instance more manageable (especially since you can comment out items in the seed file to cut down on imports during development/testing).
Python development can generally be broken in to two use cases:
- ESP client (ESP module) for running pipeline scripts - manipulating data directly with esp module or via rest apis.
- Lab7 module - creating custom api endpoints for js or 3rd parties to interact with. custom api endpoints only use POST, and can only return JSON.
- add secrets to configuration app to use with third party apis.
- new objects can be created / configured with json or imported yaml files
- updating objects with the esp module makes use of obj.push(), obj.drop(), obj.create()
- worksheet objects are based on pandas df
- watching logs:
l7esp@20db3ab3fe40:~/data/log$ tail -f -n 40 *-l7-esp.log
Pipeline scripts are only updated in ESP when you run make install
which is a hassle. It's easier just to copy the script to where it needs to go.
You can edit a pipeline script directorly in the container, or if you update it outside of the container, you must copy it to the appropriate location like so:
cp ~/data/project/content/pipelines/scripts/script1.py ~/data/content/pipelines/scripts/script1.py
- for bringing up ESP client env (if not brought up by default):
source ~/data/extensions/client/bin/activate
- install bpython in container to use it within the ESP model context:
pip install bpython
- run bpython:
python -m bpython
- bpython and esp client outside of container:
/usr/local/bin/python3.9 -m bpython
- pass values to pipelines scripts arguments from a worksheet like so:
{{column_name}}
- tasks themselves do not know about LIMS expressions, you must pass values via column names
- by default sits on same server as esp, talks directly to it
- uses pandas to read csv files, parse data
- the python client abstracts a lot of complexity associated with interacting with the ESP backend, but there are REST-based endpoints available for interacting with the backend directly as well (uses requests library)
When the pipeline runs from the UI, it creates a new folder in:
~/data/pipeline/pipeline-name-<name+datetime>
- https://www.makeuseof.com/sort-python-dataframes/
- https://datagy.io/pandas-iterate-over-rows/
- https://datagy.io/pandas-dataframe-to-list/
- https://datagy.io/pandas-replace-nan-with-zeroes/
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', None)
- Samplesheet - instance of workflow, contains all the protocols. Most development with ESP module will use Experiment and Sample sheet models
- Worksheet - instance of protocol, one sheet in a collection of sheets in a workflow.
-
step_instances is a Protocol Instance, aka, the instance of the protocol definition. This is which protocol the samples/experiment is using. We would expect it to be the same for all samples. You generally wont ever use this model.
-
sample_sheets is a list of sample sheets that the samples in the experiment are located in. For example, if you had an experiment that had 3 samples in it, you could send each sample into their own sample sheet, all into the same, or some combination. sample_sheets are the recommended way of working with LIMS.
-
step_instance_sample is how we represent a sample inside of a sample_sheet
from esp.models import Project
p = Project('39e2b85d-1d06-4d30-9b87-7eb991c3523c')
# search on lists of items
p_group = Project.search(name=None, tags=None, method="or", uuids=None, names=None)
p.data
p.samples
p.experiments
p.summary()
import esp
from esp.models import SampleSheet
import pandas as pd
pd.set_option('max_columns', None)
pd.set_option('display.width', None)
samplesheet = SampleSheet(args.worksheet)
samplesheet.summary()
# queries are performed lazily when a property is accessed
samplesheet.uuid
samplesheet.name
samplesheet.created_at
samplesheet.samples
samplesheet.project
samplesheet.experiments
samplesheet.workflow
samplesheet.workflow.protocols
samplesheet.tags
# get the protocols
p1 = samplesheet.protocol('Protocol One')
p2 = samplesheet.protocol('Protocol Two')
# access all the values in the worksheet as a df
print(p1.df)
print(p1.samples) # returns a list
print(p1.samples[0].variables) # returns a dict of entity props
# iterate through the df and alter it
for i, r in protocol.df.iterrows():
protocol.df.at[i, colName] = "new value"
# optional save params: evaluate=False, refresh=False
protocol.save()
protocol.samples
protocol.samples[1].variables
import esp
from esp.models import Sample
s = Sample(sample_uuid)
# queries are performed lazily when a property is accessed
s.uuid
s.name
s.variables
s.experiments
s.parents
s.children
s.locations
s.projects
s.tags
s.entity_type.name
s.summary()
s.variables['Pool Name'] = 'Custom Pool Name'
s.push() # saves sample, but is slow
s.push(dry=True) # returns the obj as dict, but does not save
# save sample, but faster
ebase.SESSION.put('/api/samples', json={'samples': payload})
sample_specs = []
for sample in samples_by_uuid.values():
sample_spec = sample.push(dry=True)
sample_spec['uuid'] = sample.uuid
sample_specs.append(sample_spec)
esp.base.SESSION.put('/api/samples', json={'samples': sample_specs})
# search on lists of items
Sample.search(name=None, tags=None, method="or", uuids=None, names=None)
sample_group = Sample.search(uuids=['e7168c50-9830-4681-aeea-19084bb3e631', '237e54db-864f-4502-a962-433a89ead05c', '2384649d-b3f9-46c7-baa1-b7a7f7015079'])
from esp import utils
utils.eval_expression("ds('instruments')")
utils.eval_expression("current_locations('aed3080d-f682-4428-8824-6a7f49000540')")
utils.eval_expression("entity_generation('closestup:Illumina Library', ['ddbef7e5-389a-4e8e-a2b0-91b106776593', 'f902a4fe-8158-4f09-8c8d-c87eb8fb306b'])")
ESP Expressions are evaluted when the entire workflow loads, not on each sheet / page load.
- start working in the L7 console / bpython
- then move on to custom APIs for easy debugging and development
- use experiment yaml to load dummy data quickly
- you can see files generated by pipeline scripts in ESP under the 'Data' app
- each pipeline run gets its own folder in ESP, which you can inspect like so:
l7esp@4c75dc255d2f:~/data/pipeline$ cd generate-illumina-runsheet-20230604184929376/
l7esp@4c75dc255d2f:~/data/pipeline/generate-illumina-runsheet-20230604184929376$ ls -la
total 20
drwxr-xr-x 2 l7esp l7esp 4096 Jun 4 18:49 .
drwxr-xr-x 4 l7esp l7esp 4096 Jun 4 18:49 ..
-rwxr--r-- 1 l7esp l7esp 1209 Jun 4 18:49 generate-illumina-runsheet-1.sh
-rw-r--r-- 1 l7esp l7esp 1767 Jun 4 18:49 generate-illumina-runsheet-1.sh.err
-rw-r--r-- 1 l7esp l7esp 0 Jun 4 18:49 generate-illumina-runsheet-1.sh.out
-rw-r--r-- 1 l7esp l7esp 37 Jun 4 18:49 lab7-pi-uuid.txt
A Workflow Instance is the parent of a Protocol Instance. Both WorkflowInstance and ProtocolInstance have what are called "StepInstanceSamples", otherwise known as rows.
A SampleSheet is an alternative grouping of 'StepInstanceSamples'. Furthermore, a WorkflowInstance is known in the code at times as an Experiment. A SampleSheet as a Batch. We also refer to WorkflowChainInstances as Experiments sometimes, but the Chain Instance is really a tree of Workflow Instances.
WorkflowInstances are equivalent to experiment submissions: a combination of a locked-in version of a Workflow definition with Samples to process with that Workflow definition.
Workflow definitions determine which Protocols are in a Workflow, how data flows between the Protocols, and which Sample Types are allowed to enter a Workflow.
Querying Workflows is similar to querying Workflow Definitions, but Workflows point to the latest definitions.
from esp.models import File, SampleSheet
from esp.utils import format_attachment_value
fi = File.create({ 'uri': '/opt/l7esp/data/project/test_file.txt', 'name': 'myfile', 'upload': 'true'})
ss = SampleSheet('your_uuid')
prot = SampleSheet.protocol('your_protocol_name')
prot[{int_for_row_desired}]['your_attachment_column_name'] = format_attachment_value(fi)
prot.save()
-
/api/workflow_instances
==model.Experiment
-
/api/workflow_chain_instances
==model.ExperimentChain
-
/api/sample_sheets
==model.SampleSheet
(usemodel.SampleSheet.protocols
to get worksheets -
/api/protocol_instance
===model.Worksheet
after updating invokables.py, expressions.py, etc, you need to run:
make serverext
To update both server and client-side code, you can run:
make ext
-
use
l7 console
to launch ipython with l7 module imported to debug l7 modules scripts from within the container -
common lab7 module for sample interaction:
import lab7.main.api as mapi
import lab7.sample.api as sapi
import lab7.lims.api as lapi
lab7.sample.api.lookup_generation
from lab7.utils import send_notification
-
l7 console
to bring up ipython with the lab7 module enabled - the esp module is not available, however. you must pip install it from within the console!
- if testing in the L7 console, set
agent=admin
:s = sample_api.query_samples(filters={"name.like": "LIB000001-DIL02-DIL01-DIL01"}, session=session, agent=admin, return_dict=False)
-
return_dict=False
is a recommended option. it returns the result as an object rather than a dictionary. this has some efficiency considerations on the SQL Alchemy side of things, so it's almost always better to use an object. - querying other resources should work in a similar manner (
import lab7.projects.api as papi
, etc)
from lab7.sample import api as sapi
import lab7.lims.api as lapi
sapi.query_samples(filters={"name.like": "ESP0000"}, session=session, agent=agent, return_dict=False)
sample = sapi.query_samples(filters={"name": "CDNA001"}, session=context.session, agent=context.agent, return_dict=False)
samples = sapi.query_samples({"names": sample_list}, session=session, agent=agent, return_dict=False)
library_st = sapi.query_sample_types({'name': ILLUMINA_LIBRARY}, return_dict=False, agent=agent, session=session)
workshsheet = lapi.get_sample_sheet(sample_sheet_uuid, session=session, agent=agent, return_dict=False)
worksheet = lapi.create_sample_sheet_with_samples(
wf_def_uuid,
samples,
return_dict=False,
session=session,
agent=agent,
name=f'{wf_name} - {current_datetime}'
)
from lab7.sample import api as sapi
sample = sapi.query_samples(filters={"name": "LIB005852"}, session=session, agent=admin, return_dict=False)
sample = sample[0]
pool = sapi.lookup_generation([sample.uuid], "closestdown:Illumina Library Pool", session=session, agent=admin)
lib_list = ["LIB005852", "LIB006586"]
libs = sapi.query_samples({"names": lib_list}, session=session, agent=agent, return_dict=False)
libs_uuids = [lib.uuid for lib in libs]
pools = sapi.lookup_generation(libs_uuids, "closestdown:Illumina Library Pool", session=session, agent=admin)
-
for making api calls with the lab7 module, use the built-in
admin
andsession
variables:-
gen = sapi.lookup_generation(["d01b770f-1299-4a52-8bc8-63d18babd904"], -1, session=session, agent=admin)
-
sample = sapi.get_sample('4e26cf0e-1f59-42c4-806e-dac10cee40cf', session=session, agent=admin, return_dict=False)
-
sapi.hid_for_uuid("4e26cf0e-1f59-42c4-806e-dac10cee40cf", session=session, agent=admin, sequence="gtac_seq")
-
-
use:
return_dict=False
for performance reasons, and it may return additional data not found in regular dict! -
run
help(esp.obj)
in python console to access docstrings -
sample 'resource_vals` is a list of the entity properties
import logging, sys
filename = '/opt/l7esp/data/log/tim.log'
file_handler = logging.FileHandler(filename)
if os.path.exists(filename):
os.remove(filename)
file_handler.setLevel(logging.DEBUG)
stream_handler = logging.StreamHandler(sys.stdout)
stream_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
file_handler.setFormatter(formatter)
stream_handler.setFormatter(formatter)
logger = logging.getLogger('esp')
logger.setLevel(logging.DEBUG)
logger.addHandler(file_handler)
logger.addHandler(stream_handler)
logger.debug('---')
import logging
import sys
import esp
import esp.models
# Create a file handler with the desired log level
file_handler = logging.FileHandler('/opt/l7esp/data/log/esp.log')
file_handler.setLevel(logging.DEBUG)
# Create a stream handler to output to the console (optional)
stream_handler = logging.StreamHandler(sys.stdout)
stream_handler.setLevel(logging.DEBUG)
# Create a formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
file_handler.setFormatter(formatter)
stream_handler.setFormatter(formatter)
# Create a logger object
logger = logging.getLogger('esp')
logger.setLevel(logging.DEBUG)
# Add the handlers to the logger
logger.addHandler(file_handler)
logger.addHandler(stream_handler) # (optional)
# Example usage of the logger
logger.debug('This is a debug message')
logger.info('This is an info message')
logger.warning('This is a warning message')
logger.error('This is an error message')
logger.critical('This is a critical message')
and then:
l7esp@20db3ab3fe40:~/data/log$ tail -f -n 20 'esp.log'
In invokables.py:
@invokable(name='addnums')
def custom_api(agent, a, b):
return {'result': int(a)+int(b)}
http://localhost:11702/api/invoke/addnums via POST a = 2 b = 2
- use
l7 console
to test back-end APIs - Check Extensions Applet to make sure extensions are loaded properly
- invokables.py - Test via postman/API utility
- expressions.py - Use REST API
/api/extensions/eval_expression
oresp.utils.eval_expression
To debug invokables.py
you need two files put in to a folder labeled .vscode
in /opt/l7esp
inside of a VS Code instance that is attached to a running container.
tasks.json
{
"version": "2.0.0",
"tasks": [
{
"label": "stop celery",
"type": "shell",
"command": "l7 stop l7-esp.celery"
},
{
"label": "start celery",
"type": "shell",
"command": "l7 start l7-esp.celery"
},
{
"label": "stop l7 webworkers",
"command": "l7 stop l7-esp.http:* || echo 0",
"type": "shell",
},
{
"label": "start l7 webworkers",
"command": "l7 start l7-esp.http:* || echo 0",
"type": "shell",
}
]
}
launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "celery",
"type": "python",
"request": "launch",
"stopOnEntry": false,
"python": "/usr/bin/python",
"program": "/usr/bin/celery",
"console": "integratedTerminal",
"args": [
"-A",
"L7Celery",
"worker",
"-l",
"info",
],
"cwd": "/opt/l7esp/server/services",
"env": {
"LAB7DATA": "/opt/l7esp/data",
"CELERY_TASK_ALWAYS_EAGER": "true",
"TASK_ALWAYS_EAGER": "true"
},
"preLaunchTask": "stop celery",
"postDebugTask": "start celery",
},
{
"name": "l7-esp.http",
"type": "python",
"request": "launch",
"program": "/opt/l7esp/server/services/l7-esp.http.py",
"console": "integratedTerminal",
"justMyCode": false,
"python": "/usr/bin/python",
"args": [
"-n",
"api"
],
"preLaunchTask": "stop l7 webworkers",
"postDebugTask": "start l7 webworkers"
}
]
}
From there, place your breakpoint where appropriate in invokables.py
and from the 'Run and Debug' dropdown in the Debug panel, choose celery
and hit the green run button.
Now you can run whatever you need to from within the ESP UI. When the task you are running reaches the breakpoint, VS Code will surface over the browser and you can begin your debugging session.
The data folder is part of a volume that persists between container stop/starts. The content folder here represents the "installed/runtime" files. The data/project folder is a bind mount from your host system project that represents the source files.
By default, only make install
will copy the source files over to the runtime files. An alternative solution is to cp ~/data/project/content/pipelines/scripts/script1.py ~/data/content/pipelines/scripts/script1.py
A more extensible solution is to use a custom Ansible task (link content or sometimes called "copy" content) to symlink from ~/data/project/content
inside the container to ~/data/content
. make copy_content
Production releases will always get make install
'ed to reseed yamls, copy client extensions into the static files directory, copy the python extensions into the l7ext folder. Because these folders will be part of a volume, you can blow away the container after the make install and still have the static/extension files that aren't stored in the database.
The Data/conf directory contains the configuration files for the L7 Informatics System Services. The Data/conf/lab7.conf file contains configuration information for the L7 Informatics web server and system services. The Data/conf/postgresql.conf file contains configuration information used by the PostgreSQL database server.