access_NewSun_011 - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki


#!html
<h1  style="text-align: center; color: green"> CAWCR-BoM ACCESS NWP Ngamai Migration Working Group</h1>

CAWCR-BoM ACCESS-NWP Ngamai Porting Working Group Meeting Notes

Meeting 11: Wednesday 2nd October 2013, 9E Meeting Room

Present: Joerg Henrichs, Martin Dix, Jim Fraser, Wenming Lu, Chris Tingwell, Michael Naughton, Robin Bowen, Asri Sulaiman, Joan Fernon(phone)

Apologies: Ed Habjan, Ilia Bermous, Zhihong Li, Yi Xiao


Agenda

  • List from previous meeting notes
  • Task List
  • AOB

AG1

  • Ran 3 months to real time, but a problem was discovered (now fixed).

    • Verification was not run because of MARS unavailability.
    • Problem was noticed by Xiao in ACCESS-TC verification.
    • Troubleshooting was aided by Gary's difference plot.
    • Tracked down to an abort in SST data update task
      • SST's not updated from July values at start of trial period.
      • Failure was due to missing directory not created; error was not trapped by SMS.
    • Not noticable over short timeframe.
    • Impact on result is negligible over Australia.
    • Impact shows up clearly in NH.
    • Overall the problem was quickly spotted and solved.
  • Restart run commenced from 1st Sept with problem fixed.

    • Expect to catch up to real time in ~ 1 week.
  • Gary's Diagnostics - Robin to follow up

    • Python program which goes through all output fields and produces statistics, difference plots, etc.
    • Can include other programs from various people.
    • Plan to document for general use.

AR1

  • Will re-run from 1st September.
  • Ivor working on post-processing tasks.
  • Ingestion to MARS to start soon.
    • Limited fields/levels and timesteps for limited verification to reduce load on MARS-1.
  • MARS-7 still being debugged.
    • Data from ACCESS-TC with 3 digit MARS IDs have not been flushing, clogging the cache.
    • This was discovered by Arn and Tan -- flushing will improve MARS-1 capacity by 20%.
  • Verification using files on disk being tried out by Xiaoxi Wu.

MARS / SAM

  • Tan Le to analyse users use of MARS fields.

    • Identify fields which can be archived to tape more quickly.
    • Email CAWCR users to survey usage (rab).
  • Copying of output from Daily run to NCI ( as part of RDSI project ) may further reduce MARS load as some users can access data at NCI.

  • rab, jrf and Joan to follow up

  • Recompilation of MARS-7 being done.

  • Crucial meeting on future of MARS-7 to be held Thursday 3rd October - May decide to revert to old MARS and in interim instead of MARS-7 at SDC.

  • Arn will be on leave in November which will impact MARS development work.

AC1

  • No new updates from Wenming; all is well.
  • NMOC - Joan about to get started on operational ACCESS-C version.

ATC1

  • Joan may be able to start looking at ATC later in October, continuing to Nov.
  • For now, ACCESS-C is higher priority.
  • Xiao will be away for 2.5 weeks from 18 Oct.
    • Will work on new ACCESS-TC & APS2 ACCESS-G when she returns.
  • Improvement proposed for reconfiguration step for various ACCESS-TC domains:
    • Run re-config over whole domain, then use subset as needed.
    • Solution to problem with no land points in ACCESS-TC domains.
    • Martin to supply the job.

NGAMAI ISSUES

  • Xiao problem with obs task on ngamai has been fixed.
  • More monitoring is being done on Ngamai to spot node problems more quickly.
  • James Mandilas / Rob Jukic preparing for operational support to allow mid-November NMOC operational switch over.
  • NMOC plan to be ready for operstional switchover to Ngamai from 1st November.

Run time variation / Ngamai performance

  • Kernel tuning changes to disable defragmentation of "Transparent Huge Pages" to be applied to all ngamai computing nodes.

  • The changes should fix Xiao problem with slow reconfiguration execution, from 120-1200s to around 103s.

  • Problem with 2nd UM run substantially slower than 1st run in 2 run test job has been tracked down to a LUSTRE problem which occurs when 2nd job simply overwrites the files created by 1st job. If a fresh directory is used for 2nd job, elapse time variation disappears.

  • A meeting between BoM/ORACLE and NCI sysadmins is being scheduled.

Executable Build procedures and documentation

  • Work on documentation page continuing (https://trac.nci.org.au/trac/access/wiki/Access_NWP_Build_Procedures)

  • "Official" UMUI job for building operational UM VN7.5 Global Executable is now available (qaaba)

  • Job for ACCESS-R and ACCESS-C executable to be done.

  • Dan Cook of Oracle is looking at ksh issue on Ngamai.

  • SCI cgi monitor not working due to Perl issue.

  • rab following up -- work in progress.

UMUI / SVN / TRAC

  • Email announcing migration of access applications from solar to ngamai was sent to all solar users; migration scheduled for the period 4:00pm Friday 4th October to 7:00am Monday 7th October.
  • Test of all main components was done, but not everything.

AOB and items from Task List

  • Rose/Cylc set up on Ngamai

  • rab looking at arranging pre-requisite python package installations.

  • Xiao to install Rose-Cylc packages and try out SREP implementation after her leave.

  • Work to allow compilation on Ngamai compute nodes being addressed.

  • gcc has been installed.

  • "make" required, not yet installed.

  • /apps installation should now be complete.

  • UM Small execs

  • Copy from Raijin now being used.

  • Documentation on their build is requested.

  • Build on ngamai is still desirable but not urgent.

  • CAP program on Ngamai: Wenming and Martin Dix to follow up.

  • Verify is workig on Ngamai, to be done for Raijin.

  • rab sending weekly emails re Solar de-commissioning.

  • Documentation from meetings complementary to this group:

  • Desirable but not critical.

  • Some documents now on S-Drive which may be made available through this NCI trac wiki.

  • GANTT charts are not readable without the software.

  • NMOC web pages (Available within BoM Internal Network only )

  • NMOC pages from here: http://wiki.bom.gov.au/foswiki/NMOC/SoftwareSystems/NgamaiPortIntro

  • "/apps" (modules) : http://nmoc-svnop.bom.gov.au:8080/projects/modules/wiki

  • NMOC NWP Configuration Management

  • "Station list" information now in sync with DA group.

    • To be kept in SVN after design/structure is completed.
  • Executable Build procedures being developed (see "Build Procedures" section in this notes).

    • Work continuing; NMOC to try out when ready.
  • Fortnightly meetings of this ACCESS NWP Ngamai Migration Working group to be continued until shutdown of Solar.


Next Meeting

 # * * * 11am Wed 16th October 2013, 9E Meeting Room * * *

[azs, Fri 11/10/2013] First draft. [azs, Mon 14/10/13] Updates with feedback from Robin and Joerg.

[mjn, Mon 14/10/2013] Minor editing.

⚠️ **GitHub.com Fallback** ⚠️