Opened 4 weeks ago

Closed 2 weeks ago

#3319 closed help

North/South halos too small for advection error

Reported by: ggxmy Owned by: annette
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 10.7

Description

Dear Helpdesk,

I tied to run my UM v10.7 GC3.1 suite (u-bv563), which is based on a previously running suite (u-br927), but the process 'coupled' gets crashed after a few minutes of running with an error below.

? Error from routine: LOCATE_HDPS
? Error message: North/South? halos too small for advection.
? See the following URL for more information:
? https://code.metoffice.gov.uk/trac/um/wiki/KnownUMFailurePoints

The Wiki page contains some explanations and instructions on this error. But before trying that I tried running u-br927, which ran OK a few months ago (as shown in http://cms.ncas.ac.uk/ticket/3203 ), and got the same (halo) error. Isn't this strange? Has any change been made on Monsoon recently that can cause a problem like this?

Although these suites are never shown like they are committed because they have an additional file, they should basically be up to date. I may be making a minor changes though. So far I changed domain decomposition a bit smaller (36→28) but the result doesn't change.

Masaru

Change History (5)

comment:1 Changed 4 weeks ago by ros

Hi Masaru,

There haven't been any changes to Monsoon recently that would cause this to our knowledge. I would suggest trying the instructions on the Met Office page to try and diagnose the problem.

Regards,
Ros.

comment:2 Changed 4 weeks ago by ggxmy

If there is no change on Monsoon what could have caused the change? I followed the instruction but that doesn't seem to give any helpful information. Or maybe I don't know where to look? Could you please check and see if there is any clue?

comment:3 Changed 4 weeks ago by annette

  • Owner changed from um_support to annette
  • Status changed from new to assigned

comment:4 Changed 4 weeks ago by annette

Hi Masaru,

If you are running a global model with the standard halo sizes, then this error usually means that the model has become unstable with unphysically large winds. And given that it fails straight away, it points to an issue with the input data files.

I can't see any logs from your previous runs for this suite so it is hard to see whether anything is different. As Ros says it seems unlikely that a system change would cause a problem like this. It is more likely that something has changed in the suite, or in the input data you are using.

I can see from #3203 that you had problems with the start dump before.

The dump you are using in the suite is symlinked to this file:

xcslc0 um$ ls -l /projects/ukca-leeds/myosh/dumps/bg466a.da20150101_00
lrwxrwxrwx 1 myosh ukca-leeds 47 Mar 27 08:47 /projects/ukca-leeds/myosh/dumps/bg466a.da20150101_00 -> /projects/asci/myosh/dumps/bg466a.da20150101_00

And redoing the diff between the file that Ros retrieved and the dump you are using it looks like they are different:

xcslc0 um$ diff /projects/umadmin/rhatcher/u-bs160_test/bg466a.da20150101_00 /projects/asci/myosh/dumps/bg466a.da20150101_00
Files /projects/umadmin/rhatcher/u-bs160_test/bg466a.da20150101_00 and /projects/asci/myosh/dumps/bg466a.da20150101_00 differ

This is the issue you had before, so maybe try again using Ros' version of the file.

Best wishes,

Annette

comment:5 Changed 2 weeks ago by ggxmy

  • Status changed from assigned to closed

Thank you Annete. I don't understand why the data was corrupted again but I copied it again and the suite ran. I hope it will not be corrupted again.

Masaru

Note: See TracTickets for help on using tickets.