access_AccessPS34Test - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki
#!html
<h3 style="text-align: center; color: green"> Status of local implementation of UM versions</h3>
<h3 style="text-align: center; color: blue"> Testing UKMO PS34 Global N768L70 ENDGame Build and Run jobs </h2>
<h3 style="text-align: center; color: red"> UNDER CONSTRUCTION - - Work in trying out ps34 and the writing of this documentation currently on-going</h2>
- 
Documentation on PS34 is published on collab wiki at this url: !http://collab.metoffice.gov.uk/twiki/bin/view/Support/ParallelSuite34
 - 
Note: Login into collab wiki ( !http://collab.metoffice.gov.uk ) is required to access PS34 page.
 - 
Files accociated with "Global N768L70 (ENDGame)" have been downloaded and available on raijin and ngamai in:
 - 
~access/downloads/
 
- 
Apply patches from vn8.5_PS34_Global_and_EG_Configuration_patch.tgz
 - 
Create branch from trunk at vn8.5 and apply patch. * Branch URL=!https://access-svn.nci.org.au/svn/um/branches/dev/axs599/um8.5_ps34
 - 
In finalising my local ps34 branch, the following also requires consideration
 - 
List of changes in ...vn8.5/local_changes compared to [email protected]
 
Modified files (14): fcm-make/meto-x86-ifort/inc/um-atmos.cfg fcm-make/meto-x86-ifort/inc/x86-ifort-mpich.cfg src/script/control/qsatmos src/script/control/make_parexe.pl src/script/control/qsresubmit src/script/control/qsoasissetup
src/atmosphere/dynamics_advection/set_halos.F90
src/atmosphere/convection/shallow_conv-shconv5a.F90
src/atmosphere/convection/deep_conv-dpconv5a.F90
src/configs/machines/linux-ifort-nci/ext_libs/gcom_mpp.cfg
src/configs/machines/linux-ifort-nci/ext_libs/netcdf.cfg    
src/configs/machines/linux-ifort-nci/ext_libs/gcom_serial.cfg
src/configs/machines/linux-ifort-nci/ext_libs/drhook.cfg    
src/configs/machines/linux-ifort-nci/machine.cfg
New files: 11
fcm-make/linux-ifort-nci/inc/um-scm.cfg
fcm-make/linux-ifort-nci/inc/um-atmos.cfg
fcm-make/linux-ifort-nci/inc/ifort-nci.cfg
fcm-make/linux-ifort-nci/inc/um-utils.cfg
fcm-make/linux-ifort-nci/um-scm-debug.cfg
fcm-make/linux-ifort-nci/um-atmos-debug.cfg
fcm-make/linux-ifort-nci/um-utils-safe.cfg
fcm-make/linux-ifort-nci/um-scm-safe.cfg
fcm-make/linux-ifort-nci/um-atmos-safe.cfg
fcm-make/linux-ifort-nci/um-scm-high.cfg
fcm-make/linux-ifort-nci/um-atmos-high.cfg
```
- Merge local changes to my working copy of !https://access-svn.nci.org.au/svn/um/branches/dev/axs599/um8.5_ps34 and commit it.
 
 cd /g/sc/data/azs/ps34/um8.5_ps34
 svn merge  https://access-svn.nci.org.au/svn/um/branches/dev/vn8.5/local_changes
 svn commit
- Inspect !https://access-svn.nci.org.au/svn/um/branches/dev/vn8.5/metoffice_patches
* This branch contains 2 modified sources
- src/script/control/qsoasissetup
 - src/atmosphere/dynamics_advection/set_halos.F90 * Both already in "local_changes"
 
 
- Apply patch from JULES_um8.5_PS34_Global_Configuration_patch.tgz
 - Create branch from trunk (tagged at vn8.5) and apply patch. URL=!https://access-svn.nci.org.au/svn/jules/branches/dev/axs599/jules8.5b_ps34
 
- 
Upload UKMO basis_dljub into vajda in accessdev's UMUI
 - 
Apply local customisations:
 
----------------------------------------------------------------------------------------------------------------
Job 1: Accessdev-vajd.a		 "ps34_Build_and_forecast_job (from basis_dljub)"
Job 2: Accessdev-vajd.x		 "ps34_Build_and_forecast_job (from basis_dljub) Orig"
Date: 20150216			 LONG COMPARISON
----------------------------------------------------------------------------------------------------------------
   
00007:		Entry box: Mail-id for notification of end-of-run
00008:		   Job vajd.a: Entry is set to '[email protected]'
00009:		   Job vajd.x: Entry is set to 'nomail'
00010:		
00012:		Entry box: Specify alternative name
00013:		   Job vajd.a: Entry is set to 'vajd'
00014:		   Job vajd.x: Entry is set to 'umgl'
00015:	
00017:		Entry box: Target Machine user-id:
00018:		   Job vajd.a: Entry is set to '$USER'
00019:		   Job vajd.x: Entry is set to 'frpe'
       
00026:	
00027:		Check box: Change machine config file ($UM_MACHINE)
00028:		   Job vajd.a: Entry is set to 'ON'
00029:		   Job vajd.x: Entry is set to 'OFF'
00030:	
00031:	
00032:		Check box: Change target machine name ($TARGET_MC)
00033:		   Job vajd.a: Entry is set to 'ON'
00034:		   Job vajd.x: Entry is set to 'OFF'
00035:	
00036:	
00037:		Entry box: Repository directory containing FCM machine.cfg file
00038:		   Job vajd.a: Entry is set to 'linux-ifort-nci'
00039:		   Job vajd.x: Entry is inactive
00040:	
00041:	
00042:		Entry box: Host name
00043:		   Job vajd.a: Entry is set to 'raijin.nci.org.au'
00044:		   Job vajd.x: Entry is set to 'hpc2e'
00045:	
00046:	
00047:		Radio button: Define submission method
00048:		   Job vajd.a: Entry is set to 'PBS Pro (Raijin)'
00049:		   Job vajd.x: Entry is set to 'LoadLeveler'
00050:	
00051:	
00052:		Entry box: Target machine name
00053:		   Job vajd.a: Entry is set to 'linux'
00054:		   Job vajd.x: Entry is inactive
00062:		Entry box: DATAM            : Define the directory for written output with time-stamped names
00063:		   Job vajd.a: Entry is set to '/short/$PROJECT/$USER/85/$RUNID'
00064:		   Job vajd.x: Entry is set to '$DATADIR/$RUNID'
00065:	
00066:	
00067:		Entry box: DATAW            : Define the directory for other output file
00068:		   Job vajd.a: Entry is set to '/short/$PROJECT/$USER/85/$RUNID'
00069:		   Job vajd.x: Entry is set to '$DATADIR/$RUNID'
00077:		Differences in Table Hand edits
00078:	 	1,10c1,10
00079:		<  /g/data1/dp9/axs599/ps34/hand_edits/GL_HANDEDITS_8.5_stashc_DUSTPS32 Y
00080:		<  /g/data1/dp9/axs599/ps34/hand_edits/GL_HANDEDITS_8.5_foamblk Y
00081:		<  /g/data1/dp9/axs599/ps34/hand_edits/GL_HANDEDITS_8.5_SMNSout_7p5minTS Y
00082:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_p2t_weight_fix.pl Y
00083:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_eta_s_0.5.pl Y
00084:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_sc_1361.pl Y
00085:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_filter_cloud_tau0.01 Y
00086:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_srf_agg.ed Y
00087:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_emis_ssi_full.pl Y
00088:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_EG_package_hack.ed Y
00089:		---
00090:		>  ~gmdd/um/handedits/vn8.5/GL_HANDEDITS_8.5_stashc_DUSTPS32 Y
00091:		>  ~gmdd/um/handedits/vn8.5/GL_HANDEDITS_8.5_foamblk Y
00092:		>  ~gmdd/um/handedits/vn8.5/GL_HANDEDITS_8.5_SMNSout_7p5minTS Y
00093:		>  ~gmdd/um/handedits/vn8.5/vn8.5_p2t_weight_fix.pl Y
00094:		>  ~gmdd/um/handedits/vn8.5/vn8.5_eta_s_0.5.pl Y
00095:		>  ~gmdd/um/handedits/vn8.5/vn8.5_sc_1361.pl Y
00096:		>  ~gmdd/um/handedits/vn8.5/vn8.5_filter_cloud_tau0.01 Y
00097:		>  ~gmdd/um/handedits/vn8.5/vn8.5_srf_agg.ed Y
00098:		>  ~gmdd/um/handedits/vn8.5/vn8.5_emis_ssi_full.pl Y
00099:		>  ~gmdd/um/handedits/vn8.5/vn8.5_EG_package_hack.ed Y
00108:		Entry box: Local machine root extract directory (UM_OUTDIR)
00109:		   Job vajd.a: Entry is set to '$HOME/UM_OUTDIR'
00110:		   Job vajd.x: Entry is set to '$HOME/um_extracts'
00111:	
00112:	
00113:		Entry box: Target machine root extract directory (UM_ROUTDIR)
00114:		   Job vajd.a: Entry is set to '/short/$PROJECT/$USER/UM_ROUTDIR'
00115:		   Job vajd.x: Entry is set to '/data/nwp/nm'
00123:		Entry box: Specify revision number or keyword of code base to use
00124:		   Job vajd.a: Entry is set to 'HEAD'
00125:		   Job vajd.x: Entry is inactive
00126:	
00127:	
00128:		Check box: Use precompiled build
00129:		   Job vajd.a: Entry is set to 'OFF'
00130:		   Job vajd.x: Entry is set to 'ON'
00131:	
00132:	
00133:		Check box: Include modifications from branches
00134:		   Job vajd.a: Entry is set to 'OFF'
00135:		   Job vajd.x: Entry is set to 'ON'
00136:	
00137:	
00138:		Check box: Use different version of the UM code base from the default for this UMUI version
00139:		   Job vajd.a: Entry is set to 'ON'
00140:		   Job vajd.x: Entry is set to 'OFF'
00141:	
00142:	
00143:		Entry box: The Subversion URL (UM_SVN_URL)
00144:		   Job vajd.a: Entry is set to 'https://access-svn.nci.org.au/svn/um/branches/dev/axs599/um8.5_ps34'
00145:		   Job vajd.x: Entry is set to 'fcm:um-tr'
00153:		Entry box: Specify revision number or keyword of JULES code base
00154:		   Job vajd.a: Entry is set to 'HEAD'
00155:		   Job vajd.x: Entry is set to 'um8.5'
00156:	
00157:	
00158:		Entry box: The Subversion URL (JULES_SVN_URL)
00159:		   Job vajd.a: Entry is set to 'https://access-svn.nci.org.au/svn/jules/branches/dev/axs599/jules8.5b_ps34'
00160:		   Job vajd.x: Entry is set to 'fcm:jules-tr'
00161:	
00162:	
00163:		Check box: Include modifications from branches
00164:		   Job vajd.a: Entry is set to 'OFF'
00165:		   Job vajd.x: Entry is set to 'ON'
00173:		Entry box: Filename for the Model executable
00174:		   Job vajd.a: Entry is set to '${RUNID}_um-atmos.exe'
00175:		   Job vajd.x: Entry is set to 'um-atmos.exe'
00176:	
00177:	
00178:		Entry box: Filename for the Reconfiguration executable
00179:		   Job vajd.a: Entry is set to '${RUNID}_um-recon.exe'
00180:		   Job vajd.x: Entry is set to 'um-recon.exe'
00188:		Check box: Including the following list of user file overrides
00189:		   Job vajd.a: Entry is set to 'OFF'
00190:		   Job vajd.x: Entry is set to 'ON'
00199:		Differences in Table Specify the STASHmaster files
00200:	 	1,4c1,4
00201:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/st_0_246
00202:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/tca_up_to_6km
00203:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/STASHmaster_thermal
00204:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/eg_test_stmaster
00205:		---
00206:		>  ~gmdd/um/userstash/vn8.5/st_0_246
00207:		>  ~gmdd/um/userstash/vn8.5/tca_up_to_6km
00208:		>  ~gmdd/um/userstash/vn8.5/STASHmaster_thermal
00209:		>  ~gmdd/um/userstash/vn8.5/eg_test_stmaster
00210:		
- On "Submit" qsub command not found
 
Submitting umui_runs/vajda-047163645/stage_1_submit via 'qsub' on raijin.nci.org.au
/bin/bash: qsub: command not found
MAIN_SCR: Submit failed
- 
Try adding "module load pbs" in .profile
 - 
For now work-around by manually qsubbing on raijin
 - 
Investigate if UMUIX setup on accessdev can be updated
 - 
With manual qsub, job failed exceeding walltime
 
axs599@raijin4 5056>   tail -18  /home/599/axs599/output/vajda000.vajda.d15047.t163647.comp.leave
mpif90 -o ni_conv_ctl.o -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/umatmos/inc -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/baserepos/JULES/inc -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/baserepos/JULES/inc -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/baserepos/UMATMOS/inc -O3 -xHost -fp-model precise -g -traceback -mcmodel=medium -g -i8 -8e3262e565652ac69b4b02b09b064c4f88b8c8e2      -openmp -c /short/dp9/axs599/UM_ROUTDIR/axs599/vajda/umatmos/ppsrc/UM/atmosphere/convection/ni_conv_ctl.f90
ifort: command line warning #10212: -fp-model precise evaluates in source precision with Fortran.
ifort: command line remark #10010: option '-pthread' is deprecated and will be removed in a future release. See '-help deprecated'
=>> PBS: job killed: walltime 3647 exceeded limit 3600
make: *** [ni_conv_ctl.o] Terminated
======================================================================================
			Resource Usage on 2015-02-17 15:00:51.891711:
	JobId:  9268436.r-man2  
	Project: dp9 
	Exit Status: 271 (Linux Signal 15)
	Service Units: 6.08
	NCPUs Requested: 6				NCPUs Used: 6
							CPU Time Used: 01:00:14
	Memory Requested: 9000mb 			Memory Used: 664mb
							Vmem Used: 818mb
	Walltime requested: 01:00:00 			Walltime Used: 01:00:49
	jobfs request: 100mb				jobfs used: 1mb
======================================================================================
axs599@raijin4 5057>  
- With wall time increased significantly, build job finally succeeded in building um-atmos executable, but fail to build qxreconf executable.
 
/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/umrecon/ppsrc/UM/control/misc/ukmo_grib_mod.f90(108): error #6404: This name does not have a type, and must have an explicit type.   [ZHOOK_OUT]
IF (lhook) CALL dr_hook('DECODE',zhook_out,zhook_handle)
---------------------------------^
- 
Seek advice from Scott Wales and Martin Dix
 - 
Try out standard um8.5 build job
 - 
This job (vajdy) built and ran successfully.
 - 
Use vajdy to build qxreconf executable using ps34 source (from my branch).
 - 
This also built successfully.
 
Reconfiguration job to add ancil fields stripped from daily-downloaded UKMO initial conditions files qwqg00.reduced.YYYYMMDD400.T+3.gz 
- 
Set up job vajdf from UKMO's dljtc
 - 
Ran into problem due to unrecognised namelists:
 
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: check_iostat
? Error Code:    19
? Error Message:  Error reading namelist temp_fixes. Please check input list against code.
? Error generated from processor:     0
? This run generated   0 warnings
????????????????????????????????????????????????????????????????????????????????
- 
The above problem and similar namelist issues was solved by turning off all hand-edits in vajdf
 - 
The job then complained about vertlev file
 
Vertical Levels file: /projects/access/umdir/vn8.5/ctldata/vert/vertlevs_L70_50t_20s_80km                                                                                                                                                                                           
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: Rcf_Read_Namelists
? Error Code:    80
? Error Message: Vertical Levels Namelist file does not exist!
? Error generated from processor:     0
? This run generated   1 warnings
????????????????????????????????????????????????????????????????????????????????
- 
Replace reference to vertlevs_L70_50t_20s_80km with vertlevs_L70_80km
 - 
After that the reconf job went on to produce an astart files with 142 field types
- but alas eventually aborted complaining the absence of Field 418 Sec 0:
 
 
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: Rcf_Set_Data_Source
? Error Code:    30
? Error Message: Section   0 Item   418 : Required field is not in input dump!
? Error generated from processor:     0
? This run generated   1 warnings
????????????????????????????????????????????????????????????????????????????????
- According to STASHmaster, the field is "Dust parent soil clay fraction"
 
>  grep 418   STASHmaster_A
1|    1 |    0 |  418 |Dust parent soil clay fraction (anc)|
- 
Study UMUI job vajdf again and found that the ancil settings to add "SOILDUST" is through Scientific section.
 - 
Model Selection
 - 
Atmosphere * Scientific Parameters and Sections
- Section by section choices
 - -- Section 17: Aerosols
 - Follow-up panel "DUST"
 
 - 
Turn "dust" on. Enter $UM_ANCIL_SOILDUST_DIR & $UM_ANCIL_SOILDUST_FILE in relevant boxes
 - 
Job ran much further but failed due to memory limitation.
 
/projects/access/umdir/vn8.5/linux/scripts/qsrecon: Executing dump reconfiguration program
*********************************************************
RCF Executable : /short/dp9/axs599/UKD/ps34/bin/vajdy_qxreconf
*********************************************************
=>> PBS: job killed: mem 22012688kb exceeded limit 8192000kb
mpiexec: killing job...
======================================================================================
			Resource Usage on 2015-02-25 15:40:07.653319:
	JobId:  9408219.r-man2  
	Project: dp9 
	Exit Status: 271 (Linux Signal 15)
	Service Units: 0.03
	NCPUs Requested: 4				NCPUs Used: 4
							CPU Time Used: 00:00:57
	Memory Requested: 8000mb 			Memory Used: 21497mb
							Vmem Used: 30892mb
	Walltime requested: 00:10:00 			Walltime Used: 00:00:28
	jobfs request: 100mb				jobfs used: 1mb
======================================================================================
- 
Even after significant increase in memory allocation, memory problem persist
 - 
... to be continued
 
- TO--BE--ADDED
 
- TO--BE--ADDED
 
======================================================================================