DIRAC and GridPP: an example with CERN@school - gridpp/dirac-getting-started GitHub Wiki

This guide presents a fully-worked example of using DIRAC for uploading, processing and analysing a dataset with the metadata functionality. The code and example dataset is taken from the CERN@school programme and provided with repository for convenience. The aim is that you can work through the example yourself before trying to use DIRAC with your own data.

##Prerequisites

In addition to DIRAC, make sure you have installed the numpy, json and argparse Python modules on your local system. You'll need these to run the scripts mentioned below.

##Uploading the test dataset - frames Firstly we will need to upload some data. We have provided a test dataset in testdata/ASCIIxyC. This is direct output from a measurement made of the radiation produced by one of William Crookes' notebooks using one of the CERN@school detectors. The data (.txt) represents the pixel (x, y) position and the counts measured in each one-second exposure (frame). There are also accompanying detector settings files (.txt.dsc) for each frame. You don't have to worry too much about what's going on here - the CERN@school code provided provides Python wrapper classes and helpers for getting the data ready for the grid.

Once you have initialised your grid proxy with dirac-proxy-init, you can run the upload script. We will setup a few environment variables for convenience:

$ export MY_LOCAL_OUTPUT_DIR=tmp/       # A directory for local output.
$ mkdir $MY_LOCAL_OUTPUT_DIR            # <- Create it.
$ export SITE=LCG.Glasgow.uk            # Your favourite grid site.
$ export SE=GLASGOW-disk                # Your favourite Storage Element.
$ export DFC_OUTPUT_DIR=dirac-tutorial  # The DFC path for the uploaded data.
$ python upload_frames testdata/ASCIIxyC/ $MY_LOCAL_OUTPUT_DIR 001 $SITE $SE $DFC_OUTPUT_DIR/frames

(Note: you may need to uncomment the dirac.submit(j) line in upload_frames.py - though it doesn't hurt to practise running the script without actually submitting the job first.)

You should see some information about what the script did and the result of the job submission via DIRAC. If all went well, you should also be able to monitor the job's process on the DIRAC web portal. Once the job has finished running, you can check that the data has uploaded and registered in the DFC using the File Catalog Client:

$ dirac-dms-filecatalog-cli
Starting FileCatalog client

File Catalog Client $Revision 1.17 $Date:

FC:/>cd /cernatschool/user/t/t.whyntie/dirac-tutorial/frames
FC:/cernatschool.org/user/t.whyntie/dirac-tutorial/frames>ls
B06-W0212_1371575424-293207.txt
B06-W0212_1371575425-337648.txt
B06-W0212_1371575426-414549.txt
B06-W0212_1371575427-489662.txt
B06-W0212_1371575428-551945.txt

If you can see the five frames, congratulations! You've uploaded data to the grid using the DIRAC Python API. Now let's add some metadata to them.

##Adding the metadata - frames DIRAC is able to assign the data describing the data - the metadata - via the FileCatalogClient in the Python API. The previous script created a JSON file containing frame metadata in the folder you specified with the $MY_LOCAL_OUTPUT_DIR environment variable. Assigning metadata is done directly via the DIRAC API and so no grid job is required. However, we do need to supply your DIRAC DFC home directory, which will depend on your VO name and DIRAC username:

$ export DFC_HOME=/cernatschool.org/user/t/t.whyntie/

Then you can run the following script:

$ python add_frame_metadata.py $MY_LOCAL_OUTPUT_DIR/B06-W0212_1371575424-293207.json $MY_LOCAL_OUTPUT_DIR $DFC_HOME$DFC_OUTPUT_DIR/frames

You check if this has worked, you can use the File Catalog Client command line tool to inspect the metadata properties of each frame:

FC:/>cd /cernatschool/user/t/t.whyntie/dirac-tutorial/frames
FC:/cernatschool/user/t/t.whyntie/dirac-tutorial/frames>meta get B06-W0212_1371575424-293207.txt
             n_pixel : 735
...
          start_time : 1371575424

(Note: the metadata indices for the CERN@school data have already been added to DIRAC. When you're ready to add your own data and metadata indices, you'll need to follow the instructions here.

##Metadata queries - frames

$ python perform_frame_query all_frames_query.json $MY_LOCAL_OUTPUT_DIR $DFC_HOME$DFC_OUTPUT_DIR/frames

##Processing on the grid - clusters from frames Clusters are groups of adjacent pixels found in a given frame that (generally) correspond to ionising radiation measured by the detector. As such it is useful to extract the clusters found in the frames, calculate their properties (e.g. size, linearity, etc.) and store them separately.

##Adding the metadata - clusters

  • Retrieve the job output ** Get the klusters.json
  • Run the metadata script

##A simple analysis

  • Get the clusters of interest.