access_wml548sms20120926 - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki
Running sms on NCI/BOM supercomputer
Author: Wenming LU, E-mail: [email protected], Tel: 03 96694528
Updates
lwenming 2012 OCT 02 Initial creation of the page
lwenming 2013 JUL 18 SMS available on raijin
lwenming 2013 OCT 08 SMS available on ngamai
SMS Background Information
SMS is a scheduling system in which users are able to run a series of jobs in a predefined timing, dependencies and so on. Please refer to the link
http://www.ecmwf.int/publications/manuals/sms/
for the detailed information of SMS.
In this wiki, we assume that the readers have had some knowledge on SMS already and just particularly focus on discussing how to run SMS server and test an experimental suite on NCI/BOM,
- sms module
- Starting and Stopping SMS Server on NCI/BOM supercomputer (
raijin
andngamai
;raijin
andngamai
are supercomputers in NCI and BOM, respectively) - Using the GUI client interface
xcdp
- Playing a test suite into SMS
- Managing the test suite through
xcdp
sms module
There is a module, sms, available on raijin
which provide a user friendly environment to run SMS server and suites.
For using the sms module on raijin
, please run the following commands,
module use ~access/modules
module load sms
On ngamai
due to that BOM's operational centre NMOC is running different version of SMS on ngamai
, the module is renamed to smsre
standing for research SMS,
module use ~access/modules
module load smsre
Running SMS server
After loading the sms module, type
sms_start.sh
to start your own SMS server. You should be able to run SMS server on any raijin
/ngamai
main node.
Note: Please make consistent use of the main node in which the SMS server is running and the the variable SMSHOST
/SMS_HOST
in your
suite definition file and SMS scripts.
To stop the SMS server, type
sms_stop.sh
Note here it is very important that you do use the script sms_stop.sh
to terminate the SMS server. sms_stop.sh
will
properly release the port numbers used by SMS server back to system and terminate the SMS server. If you use other method
to stop the SMS server, such as kill
, you will encounter problems restarting SMS server because the port numbers have not been
released yet and sms_start.sh
can not attach SMS server to the default port numbers specified (900000+$UID).
xcdp
SMS client interface Please note that all pictures are taken from the NCI decommissioned superciompter vayu
; these pictures should be same on any SMS host machine
xcdp
is an x-window based GUI tool and very easy to work with. Type
xcdp
to start xcdp
. Once started, you will see a window as below and go to the menu Edit->Preferences...
,
On the pop-up window, select the tab Servers
, to edit your SMS server details in here,
Here are details of how to set up those items,
Item |
Value |
Comment |
---|---|---|
Name |
ngamai02_lwenming or raijin2_wml548 |
Anything but better be meaningful, my convention is $hostname_$USER |
Host |
ngamai02 or raijin2 |
SMSHOST |
Number |
906674 |
SMS_PROG (900000+$UID, specified in sms_start.sh ); in my case, e.g.906674 |
Close the pop-up window and go back to main window menu Servers
, and you are able to see all SMS servers defined in the previous step;
click on the server you specified and in the xcdp
main body the server will appear,
mytest
First SMS suite: Making sure the sms module has been loaded. Then type,
sms_setup.sh
This command does the following things,
- Create a folder
sms
at$HOME
- Copy include files to
$HOME/sms
:access_sms_include
for tasks on$SMSHOST
(raijin
/ngamai
in here);access_nci_include
for jobs on NCI supercomputer (raijin
/ngamai
as well in here) - Copy the test suite
mytest
to$HOME/sms/suite
; the definition file and SMS scripts will be tailored according to your own environment, such as $USER, $HOST, $PROJECT etc
We are now ready to send the test suite mytest
to the SMS server,
play_suite.sh mytest
Right click the server box on xcdp
and choose suites
on the pop-up window, and select the suite mytest
; finally click the green button to refresh the server status. You should be
able to see the suite mytest
within the SMS server. Type,
begin_suite.sh mytest
Then mytest
will be made ready to send tasks to supercomputers (The colour of mytest
should be changed from dark gray to blue). In case you need to replay the suite, just do
cancel_suite.sh mytest
followed by play and begin to replay the suite.
mytest
on xcdp
Managing suite The suite mytest
has the structure as follows,
- mytest #SUITE
- -> test1 #FAMILY
- -> test_local #local TASK; running on
SMSHOST
- -> test_nci #remote TASK; running on
PBSHOST/SGEHOST
, ie.,raijin
/ngamai
- -> admin #FAMILY
- -> clean #local TASK
In our SMS structure, there are two machines:
SMSHOST
: machine running SMS serverPBSHOST/SGEHOST
: machine running jobs inPBS/SGE
queuing systems
In practice, SMSHOST
could be same as PBSHOST/SGEHOST
. However, remote run will be still be executed as if SMSHOST
and PBSHOST/SGEHOST
are two different machines. In NCI environment, you may choose either accessdev
, accessprod
or even raijin
/ngamai
as SMSHOST
but raijin
/ngamai
is always
PBSHOST\SGEHOST
.
All tasks in mytest
have been tested successfully if SMS is set up properly,
- test_local: local run on
SMSHOST
; touches a filefrom_local_to_local.$$
in$HOME/smsout/mytest
- test_nci: remote run on
PBS/SGE
; touches a filefrom_nci_to_local.$$
in$HOME/smsout/mytest
- clean: local run on
SMSHOST
; clean up outputs in$HOME/smsout/mytest
THE END OF THIS WIKI PAGE