PadmeMonitor guide - PADME-Experiment/padme-fw GitHub Wiki
Shifter's guide to PadmeMonitor
The PadmeMonitor monitoring system is composed of several agents running on the l0padme3
server. Most agents are controlled by simple "watch-dog" scripts which check for the existance of the needed processes and restart them in case of problems. The need for intervention from the shifter should be minimal.
In case of mulfunctions (some/all plots are empty or not being updated) the shifter should check for the existance of the related processes and restart them if needed. Also, all the processes must be restarted by hand in case of a general shutdown of the DAQ system.
Below is the list of all required agents with the commands to check if they are correctly running and to restart them in case of crashes.
- PadmeMonitor
- RecoMonitor
- Trends
- OnlineRecoMonitor
- OnlineMonitor
- ChamberMonitor
- DAQStatusServer
- ServerMonitor
- ServerMonitor
- DCS
PadmeMonitor
PadmeMonitor is the central server which receives the data from all the other agents and serves them as web pages. It is based on the node.js package and the Plotly.js libraries.
The PadmeMonitor web service is accessible at URL http://l0padme3:9090 from any node within the LNF LAN/VPN.
The PadmeMonitor service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep node | grep -v grep
monitor 6390 6002 3 09:34 pts/0 00:12:41 node padmemonitor.js
To restart it:
> cd /home/monitor/PadmeMonitor
> nohup node padmemonitor.js >> padmemonitor.log 2>&1 </dev/zero &
RecoMonitor
RecoMonitor uses the standard PadmeReco program to create several data quality histograms.
The service runs under the daq
account (WARNING: NOT the monitor
account!) on l0padme3
.
To check if the process is running:
> ps -fu daq | grep RecoMonitor.sh| grep -v grep
daq 7897 7412 0 09:57 pts/1 00:00:03 /bin/bash ./RecoMonitor.sh
To restart it:
> cd /home/daq/RecoMonitor
> nohup ./RecoMonitor.sh >> RecoMonitor.log 2>&1 </dev/zero &
Trends
Trends uses the reconstructed files produced by RecoMonitor to create trend plots.
The service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep run_t_2022.sh | grep -v grep
monitor 13833 8045 0 10:26 pts/2 00:00:01 /bin/bash ./run_t_2022.sh
To restart it:
> cd /home/monitor/DigiDaq/Trends
> nohup ./run_t_2022.sh >> run_t_2022.log 2>&1 </dev/zero &
OnlineRecoMonitor
OnlineRecoMonitor converts the root histogram files produced by RecoMonitor and Trends to the format used by PadmeMonitor.
The service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep OnlineRecoMonitor_wd.sh | grep -v grep
monitor 8764 8045 0 10:01 pts/2 00:00:02 /bin/bash ./OnlineRecoMonitor_wd.sh
To restart it:
> cd /home/monitor/OnlineMonitor
> nohup ./OnlineRecoMonitor_wd.sh >>OnlineRecoMonitor_wd.log 2>&1 </dev/zero &
OnlineMonitor
OnlineMonitor is the main producer of data quality plots. It runs in real time analyzing the rawdata files while they are written and writing its output directly in the format used by PadmeMonitor.
The service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep OnlineMonitor_wd.sh | grep -v grep
monitor 8726 8045 0 10:01 pts/2 00:00:02 /bin/bash ./OnlineMonitor_wd.sh
To restart it:
> cd /home/monitor/OnlineMonitor
> nohup ./OnlineMonitor_wd.sh >>OnlineMonitor_wd.log 2>&1 </dev/zero &
ChamberMonitor
ChamberMonitor produces plot related to the new MicroMegas chamber. It runs in real time analyzing the rawdata files while they are written and writing its output directly in the format used by PadmeMonitor.
The service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep ChamberMonitor_wd.sh | grep -v grep
monitor 8726 8045 0 10:01 pts/2 00:00:02 /bin/bash ./ChamberMonitor_wd.sh
To restart it:
> cd /home/monitor/OnlineMonitor
> nohup ./ChamberMonitor_wd.sh 1>>ChamberMonitor_wd.log 2>&1 </dev/zero &
DAQStatusServer
DAQStatusServer reports the current status of the DAQ system, including information on the run configuration, the acquired events, the trigger rates, and so on. It also issues warnings and alarms if problems are present.
The service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep DAQStatusServer_wd.sh | grep -v grep
monitor 8390 8045 0 09:59 pts/2 00:00:01 /bin/bash ./DAQStatusServer_wd.sh
To restart it:
> cd /home/monitor/DAQStatusServer
> nohup ./DAQStatusServer_wd.sh >> DAQStatusServer_wd.log 2>&1 </dev/zero &
ServerMonitor
ServerMonitor checks the temperature status of the on-line DAQ servers.
The service runs under the monitor
account on l0padme3
.
To check if the process is running:
> ps -fu monitor | grep ServerMonitor_wd.sh | grep -v grep
monitor 11058 8045 0 10:14 pts/2 00:00:01 /bin/bash ./ServerMonitor_wd.sh
To restart it:
> cd /home/monitor/ServerMonitor
> nohup ./ServerMonitor_wd.sh >> ServerMonitor_wd.log 2>&1 </dev/zero &
DCS
DCS starts the main Detector Control System stuff for monitoring the experiment status.
The DCS service runs under the dcs
account on l0padme1
.
To start the standard DCS you have to log in as dcs@l0padme1 (ask F. Ferrarotto or E. Leonardi for the password if you don't yet know)
and check if it's already running by issuing :
show_threads_dcs_kernel
If it shows you a number of threads , it's already running and everything is OK , ELSE you have to start it by issuing :
start_dcs
- this will start the standard DCS main program
- then you have to start a number of "ancillary" programs doing mainly the timeplots by issuing :
start_ancillary_procs
- do not worry : the procedure will check if these programs are already running and, in case, will restart those missing
To check all these programs are running you can issue :
show_ancillary_prog
- this will show you the list of "ancillary" programs running on the machine
In case you want to kill them all you have to issue :+1:
kill_ancillary_progs
and restart them after as specified above