DB structure - PanDAWMS/panda-harvester GitHub Wiki

Local tables in SQLite

DB ER diagram

SQLite DB tables are generated in the harvestercore/db_proxy.py. Individual table structure is defined in each *_spec.py file. The initial definitions below, but will evolve over time, so please consult the relevant spec file for the up to date structure.

alt text

job_table

'PandaID:integer primary key',
'taskID:integer',
'attemptNr:integer',
'status:text',
'subStatus:text',
'currentPriority:integer',
'computingSite:text',
'creationTime:timestamp',
'modificationTime:timestamp',
'stateChangeTime:timestamp',
'startTime:timestamp',
'endTime:timestamp',
'nCore:integer',
'jobParams:blob',
'jobAttributes:blob',
'hasOutFile:integer',
'metaData:blob',
'outputFilesToReport:blob',
'lockedBy:text',
'propagatorLock:text',
'propagatorTime:timestamp',
'preparatorTime:timestamp',
'submitterTime:timestamp',
'stagerLock:text',
'stagerTime:timestamp',
'zipPerMB:integer'             

work_table

'workerID:integer',
'batchID:text',
'mapType:text',
'queueName:text',
'status:text',
'hasJob:integer',
'workParams:blob',
'workAttributes:blob',
'eventsRequestParams:blob',
'eventsRequest:integer',
'computingSite:text',
'creationTime:timestamp',
'submitTime:timestamp',
'startTime:timestamp',
'endTime:timestamp',
'nCore:integer',
'walltime:timestamp',
'accessPoint:text',
'modificationTime:timestamp',
'stateChangeTime:timestamp',
'eventFeedTime:timestamp',
'lockedBy:text',
'postProcessed:integer'

jw_table (job worker)

'PandaID:integer',
'workerID:integer',
'relationType:text'

file_table

'fileID:integer primary key',
'PandaID:integer',
'taskID:integer',
'lfn:text',
'status:text',
'fsize:integer',
'chksum:text',
'path:text',
'fileType:text',
'eventRangeID:text',
'modificationTime:timestamp',
'fileAttributes:blob',
'isZip:integer',
'zipFileID:integer',
'objstoreID:integer'

cache_table

'mainKey:text',
'subKey:text',
'data:blob',
'lastUpdate:timestamp'

event_table

'eventRangeID:text',
'PandaID:integer',
'eventStatus:text',
'coreCount:integer',
'cpuConsumptionTime:integer',
'subStatus:text',
'fileID:integer'

seq_table

'numberName:text',
'curVal:integer'

pq_table (panda queue)

'queueName:text',
'nQueueLimitJob:integer',
'nQueueLimitWorker:integer',
'maxWorkers:integer',
'jobFetchTime:timestamp',
'submitTime:timestamp'

command_table

'command_id:integer primary key',
'command:text',
'receiver:text',
'params:blob',
'ack_requested:integer',
'processed:integer'

Central tables in Oracle

Harvester_Instances

The table to register harvester instances.

Name Type Description
HARVESTER_ID NOT NULL VARCHAR2(50) Identifier of the harvester instance
DESCRIPTION VARCHAR2(200) Description for the instance
STARTTIME DATE Start time
OWNER VARCHAR2(100) Owner of the instance
HOSTNAME VARCHAR2(100) Hostname where the instance is running
LASTUPDATE DATE Set when the record is updated
SW_VERSION VARCHAR2(50) Software version
COMMIT_STAMP VARCHAR2(100) Commit stamp in the repository

Harvester_Workers

The table for workers submitted by harvesters.

Name Type Description
HARVESTERID NOT NULL VARCHAR2(50) Identifier of the harvester instance
WORKERID NOT NULL NUMBER(11) Identifier of the worker
BATCHID VARCHAR2(80) Unique ID in the batch system
NODEID VARCHAR2(80) Identifier of the node, such as hostname, IP, etc
QUEUENAME VARCHAR2(80) Name of the batch queue
STATUS NOT NULL VARCHAR2(80) Worker status
COMPUTINGSITE VARCHAR2(128) Panda Queue name
SUBMITTIME DATE Set when the worker is submitted
STARTTIME DATE Set when the worker gets CPUs
ENDTIME DATE Set when the worker is terminated
NCORE NUMBER(6) The number of cores the worker use
ERRORCODE NUMBER(7) Error code in any
LASTUPDATE NOT NULL DATE Set when the record is updated
STDOUT VARCHAR2(250) URL for stdout
STDERR VARCHAR2(250) URL for stderr
BATCHLOG VARCHAR2(250) URL for batch log
RESOURCETYPE VARCHAR2(56) Resource type
NATIVEEXITCODE NUMBER(7) Exit code in the underlying system
NATIVESTATUS VARCHAR2(80) Status in the underlying system
DIAGMESSAGE VARCHAR2(500) Error diagnostics
COMPUTINGELEMENT VARCHAR2(128) Gateway of the batch system
NJOBS NUMBER(6) The number of associated jobs
SUBMISSIONHOST VARCHAR2(128) The host name of submission node

Harvester_Rel_Jobs_Workers

The table for relationship between jobs and workers.

Name Type Description
HARVESTERID NOT NULL VARCHAR2(50) Identifier of the harvester instance
WORKERID NOT NULL NUMBER(11) Identifier of the worker
PANDAID NOT NULL NUMBER(11) Job ID in PanDA
LASTUPDATE NOT NULL DATE Set when the record is updated

Harvester_Worker_Stats

The table for realtime statistics of harvester workers.

Name Type Description
HARVESTER_ID NOT NULL VARCHAR2(50) Identifier of the harvester instance
COMPUTINGSITE NOT NULL VARCHAR2(128) Panda Queue name
RESOURCETYPE NOT NULL VARCHAR2(56) Resource type
STATUS NOT NULL VARCHAR2(80) Worker status
N_WORKERS NUMBER(7) Number of workers
LASTUPDATE NOT NULL DATE Last update time

Harvester_Commands

Command queue for harvester.

Name Type Description
COMMAND_ID NOT NULL NUMBER(10) Command ID for primary key
COMMAND VARCHAR2(200) Command string
HARVESTER_ID VARCHAR2(50) Identifier of the target harvester instance
ACK_REQUESTED NUMBER(1) 0/1, depending on whether panda server expects and acknowledgement
CREATION_DATE DATE Timestamp when the command was generated
STATUS VARCHAR2(32) Status of the call: new, retrieved, acknowledged...
STATUS_DATE DATE Timestamp when the status last changed
PARAMS CLOB Parameters for the call

Harvester_Command_Lock

Exclusive locks to send commands to harvester.

Name Type Description
HARVESTER_ID NOT NULL VARCHAR2(50) Identifier of the target harvester instance
COMPUTINGSITE NOT NULL VARCHAR2(128) Panda Queue name
RESOUCETYPE NOT NULL VARCHAR2(56) Resource type
COMMAND NOT NULL VARCHAR2(200) Command string
LOCKEDTIME DATE Timestamp when the command is locked
LOCKEDBY VARCHAR2(40) Process name which locks the command

Harvester_Dialogs

Dialog messages from harvester instances.

Name Type Description
HARVESTER_ID NOT NULL VARCHAR2(50) Identifier of the target harvester instance
DIAGID NOT NULL NUMBER(11) Serial number of the message
MODULENAME VARCHAR2(100) The module name which sent the message
IDENTIFIER VARCHAR2(100) Identifier of the message if any
CREATIONTIME DATE Timestamp when the message is sent
MESSAGELEVEL VARCHAR2(10) Message level
DIAGMESSAGE VARCHAR2(500) Dialog message

Harvester_Slots

The table defines the number of available slots at Panda Queues mainly for workload provisioning.

Name Type Description
PANDAQUEUENAME NOT NULL VARCHAR2(128) Panda Queue name
GSHARE VARCHAR2(32) Global share
RESOURCETYPE VARCHAR2(56) Resource type
NUMSLOTS NOT NULL NUMBER(11) The number of slots
MODIFICATIONTIME NOT NULL DATE When the record was updated
EXPIRATIONTIME DATE When the record expires