Opened 4 years ago

Closed 4 years ago

#1649 closed help (answered)

HadGEM2 v6.6.6 on CRAY

Reported by: PUMA_GarryHayman Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: MONSooN
UM Version:

Description

Garry - copying our email trail to the helpdesk so other members of the team can reference it if required while I'm away.

Cheers,
Ros.


Ros/AJ

This is an update and to ask your advice.

I now have working branches at v6.6.6 for the aircraft/satellite emulator (PUMA_GarryHayman/hg6.6.6_aircraft_satellite_emulator), for the nudging of winds and temperatures to ECMWF reanalyses (PUMA_GarryHayman/hg6.6.6_nudging_stdtrop) and the Gregorian calendar fix (PUMA_GarryHayman/ hg6.6.6_Gregorian_Calendar_Fixes). For the nudging and calendar fix, I made the code changes in Mohit Dalvi’s v6.6.3 branches (r5321_HG6.6.3_nudging and r5321_HG663_Gregcal_fixes).

I have setup jobs on the CRAY (xlrlc – nudging off; xlrle – nudging on) to replicate jobs on the IBM with nudging on (xldkn) and off (xldkq). As far as I can tell, these should be equivalent; I cannot compare the jobs directly using the UMUI difference as they are at different versions. My comparison of the runtime output, the dumps and the monthly output suggest large differences. I don’t know if there is an expectation of bit comparison or should I expect differences at some level. I switched the nudging off to test if this was causing the problem – no, as far as I could tell. I am in the processing of comparing the source code (umbase and ummodel) but this generates a large volume of output …..

On another point, I previously mentioned the advantage of the option ‘to archive to the nerc directory’ (which is no longer available). It converted the outputs to pp format (reducing the output size) and deleted the intermediate daily dumps. Although the runtime output suggests deletion of the dumps, this does not occur and my job xlrle (with a run length of 13 months) created 1.5 Tb of output, largely the daily dumps. I would rapidly run out of space for a longer run. I would be grateful for advice on how best to proceed. Is there a way of including features in the nerc archiving option?

Regards

Garry

Change History (3)

comment:1 Changed 4 years ago by ros

From Ros:

Jobs on the new Cray won't bit compare with the IBM. As to if any/how much difference is expected between 6.6.3 & 6.6.6 I am just enquiring with the Met Office for their validation information.

With the removal of /nerc archiving in order to automatically convert output to pp format a separate job would need to be submitted (bottom script insert?). As for the superseded dumps/files not being deleted this is a problem caused by the separating out of the archiving from the model run. I am currently looking into this and will let you know when I have a fix. One question though, do you really need to be producing daily dumps?


From Garry,

Regarding the daily dumps, I do not know the exact rationale why these are output as the original nudging branch came from Mohit Dalvi.

On occasions, the nudging code can cause the model to crash. I had this experience recently with a run on the IBM (xldko: Std Trop chemical scheme with nudging)to allow comparison with more recent satellite observations. The run crashed with an error in an interpolation routine (at 16th June 2010). In correspondence with Mohit, he said the ‘model can blow up’, implying there had been a numerical problem somewhere (although he indicated to me in this case that the latest daily dump did not seem to show any of the usual culprits i.e. non-uniform polar rows). He went on to suggest that I try to perturb the run in some way e.g. changing the convection sub-steps for a few days and then reverting back after the model passes the failure point.

I guess that the daily dumps would be useful for diagnosis in the above circumstances and for restarting the model. With the nerc archiving branch in operation, these were then deleted at the end of each resubmission period.

I will be in contact with you regarding the comparison of the IBM and CRAY jobs.


From Mohit:

The creation of daily dumps does not have anything to do with Nudging per-se. It is a ‘feature’ of switching to a Gregorian calendar in the base UM job, which is a pre-requisite for Nudging.

I think the daily dumping period is required for the proper calculation of monthly means for a non-30 day month.


From Ros:

I have just now managed to get my 6.6.6 job to delete superseded files without having to switch on archiving. I have only given it limited testing and will be on holiday next week, but you are welcome to give it a try.

You just need to include my script branch: fcm:um-br/dev/ros/hg6.6.6_delete_superseded_files
No need to recompile. Just switch on Enable build of UM scripts in UMUI window Submodel independent → Compilation & Modifications → UM Scripts Build.

I have been given the location of the validation notes from the Met Office's tests with 6.6.6.
Unfortunately, I've not had time to look at them today, but Grenville and/or Annette will take a look at them whilst I'm away next week.

comment:2 Changed 4 years ago by ros

Dear Ros

I am not ‘unhappy’ per se. Although you and others have told me that I should not expect bit comparison, I don’t have a real feel for how the same UKCA jobs on the IBM and CRAY compare. All I see are many differences, which might be acceptable. I accept that v6.6.6 is better and indeed I would like to move to a newer HadGEM version. Inevitably it is about understanding whether changes arise from science or the change in the platform. A colleague of mine at CEH was showing some interesting differences yesterday, which he believed arose from the numerical precision of the 2 platforms.

I will extend the CRAY run and see how it compares.

Regards

Garry


comment:3 Changed 4 years ago by ros

  • Resolution set to answered
  • Status changed from new to closed

Hi Garry,

I'm going to close this ticket on the Helpdesk for now, but please feel free to contact us if you need any more help or advice.

Regards,
Ros.

Note: See TracTickets for help on using tickets.