access_NewSun_007 - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki
#!html
<h1 style="text-align: center; color: green"> CAWCR-BoM ACCESS NWP Ngamai Migration Working Group</h1>
Meeting 7: Wednesday 21st August 2013, 9E Meeting Room
Present: Michael Naughton, Jim Fraser, Chris Tingwell, Robin Bowen, Ilia Bermous, Wenming Lu, Zhihong Li, Yi Xiao, Joerg Henrichs, Joan Fernon, Asri Sulaiman.
Apologies: Robert Jukic, Martin Dix, Ivor Blockley
- Review meeting notes
- Project management
- Main items in task list
- Other items from Task List
- Other business
- New format of meeting notes
- Task table will be updated at intervals, not necessarily every meeting
- Robin has reported progress to Steering Committee after previous meeting. In general they are happy with current progress.
- ACTION: Re: relative priorities of Solar switch-off, Project Planning, Configuration Management -- solar switch-off is !#1 priority, others items are also required within project, but not to delay solar switch-off timing, project will continue after solar switch-off until all steps completed. Still working on drafts of steps.
Item 17: Building executables
- Documented executable builds
- ACTION: Asri to organise discussion with Ilia, Xiao and Martin to agree and finalise detail based on Ilia's email subj: UMUI7.5 Building procedure 8/8/2013 . Initially targeted for Thu 22/8 - delayed, to be rescheduled.
- Have wiki page for each of the Documented builds.
Item 26: APS1 NWP suites
- Xiao's ngamai AG1 test NWP suite have caught up to current date. Now running daily.
- All verifications so far is fine.
- Disk usage issue: The suite is saving Terabytes of px files, threathening to use up all of Ngamais's diskspace.
- It is not practical to save px files to sam, but fields may be archiveable to MARS.
- An option is to save as "frame" files -- to be considered, but not yet.
- op-research will have 50Tb of FLUSH and 25 Tb of DATADIR limit on ngamai, but temporarily (1-2 weeks) will be able to use more than that 75Tb total.
-
- ACTION: Wenming to produce "ACCESS-GN" charts when chart plotting on ngamai is available.* REPORT: Work in progress
-
- ACTION: Look at running Gary Dietachmayer et al.'s prototype diagnostic tools on porting versions. * REPORT:
- Joan report good progress in setting up NMOC's access-G suite. Going through top level scripts with a few left -- a few more days?. When ready, to run starting from July data.
- Xiao now got the test cycling running with separate run and fetch jobs.
- Now done 4-5 days.
- Use Research MARS for archiving
- Load impact on MARS need to be monitored
- Starts run from End June to continue to current date
- Verify plots to be included
- Move archiving from MARS7 to MARS1
- Can use up to max of 37 cores N-S, with elapse time < 1 hr.
- Question of reducing from 2 to 1 run a day extending for extra 6 hours is a discussion for APS Working Group.
- Domain decomposition cannot be as large as AG2.
- Crash issue with I.C from Joan -- probably due to incompatible number of input fields.
- This may be fixable using python utility. Reconfig step is another possibile workaround.
- "rainval" plot likely useful
-
- ACTION: Wenming to produce "ACCESS-CN" charts when chart plotting on ngamai is available.* Work in Progress
- ACTION: Wenming to look at run elapse times for studying runtime variation.
Item 18: MARS
- Updates regarding MARS on ngamai was sent out via emails.
- MARS is meant for running on dm nodes, not computing nodes.
-
- ACTION: Robin to document MARS aspects and status in email and on wiki.* ** Work in Progress.**
- Dates regarding MARS7_dev and status on tape drives for SDC need to wait until R.Oxboro get back from leave(Sept)
Item 6: v11 software
- No action due on this. However keep this item in the list for now.
- ACTION: Joerg to continue these investigations.
- Variation in UM7.5 narrowed down to swap_bounds_mv
- Up to 100% variation in APS-R between partial/fully committed nodes.
- Elapse times for VAR down to 17 minutes from 24 minutes. This new times appear solid.
- Reconfiguration can take 2.5 - 20 minutes
- Use of lustre striping is recommended. Optimal striping configuration to be investigated.
- Part of Xiao's suite require running of 7 qxreconf in parallel ( 7 different files of different times)
- Joerg to investigatd qxreconf performance.
- Still in progress - finishing up stage?
- Perl issue related to SCS reported.
- Consideration of putting MARS in /apps. (Currently MARS utils are in $CWSHARE)
- Different matplotlib version?
- To be investigated further, but this issue is not critical
- Tech Talk ??
- Open MPI icc -vs- gcc -- close and move to raijin porting issues.
- work on Verify in progress (rab)
- Change resolution
- speed up required from matplotlib
- ADDITION to note in next meeting (# 15 from tasklist): UI Bigfont issue resolved. Some tidyups remain. Migration plan to be drawn.