#2941 closed help (fixed)

Suite u-bj799 failing during couple

Reported by: xd904476 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version:

Description

Hi, I have copied suite u-be699 and restarted it in 2014 with some perturbed initial conditions.
The suite failed during the couple tasks, therefore I changed the initial condition back to the non perturbed initial condition but the model still fails in the same way.
Could you help pls?
thanks,
dani

error is:

Rank 263 [Thu Jun 20 21:44:41 2019] [c7-2c1s12n2] application called MPI_Abort(comm=0xC4000005, 1) - process 263
Rank 192 [Thu Jun 20 21:44:41 2019] [c6-2c2s12n2] application called MPI_Abort(comm=0xC4000009, 1) - process 192
Rank 216 [Thu Jun 20 21:44:41 2019] [c6-2c2s15n0] application called MPI_Abort(comm=0xC4000009, 1) - process 216
_pmiu_daemon(SIGCHLD): [NID 04530] [c7-2c1s12n2] [Thu Jun 20 21:44:41 2019] PE RANK 263 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 04412] [c6-2c2s15n0] [Thu Jun 20 21:44:41 2019] PE RANK 216 exit signal Aborted
Rank 193 [Thu Jun 20 21:44:41 2019] [c6-2c2s12n2] application called MPI_Abort(comm=0xC4000003, 1) - process 193
_pmiu_daemon(SIGCHLD): [NID 04402] [c6-2c2s12n2] [Thu Jun 20 21:44:41 2019] PE RANK 192 exit signal Aborted
[NID 04530] 2019-06-20 22:44:41 Apid 36246807: initiated application termination
[FAIL] run_model # return-code=137
Received signal ERR
cylc (scheduler - 2019-06-20T21:44:47Z): CRITICAL Task job script received signal ERR at 2019-06-20T21:44:47Z
cylc (scheduler - 2019-06-20T21:44:47Z): CRITICAL failed at 2019-06-20T21:44:47Z

it happens after reading the iceberg restart file I believe

Change History (8)

comment:1 Changed 13 months ago by grenville

Dani

The error is exlained in /home/n02/n02/dflocco/cylc-run/u-bj799/work/20140101T0000Z/coupled/ocean.output. It's always worth checking this file for coupled jobs.

==⇒>> : E R R O R

===========

iom_open ~

File ./restart_trc.nc* not found

Grenville

comment:2 Changed 13 months ago by xd904476

sorry, my stupid mistake: the file was there but I missed my username in top_start

thanks a lot

comment:3 Changed 13 months ago by xd904476

Hi again, model fails now in the atmos.exe portion of coupled. I can't find an obvious error, but perhaps I am not looking in the right place.
I see practically no differences now with the old suite.

the job.err says

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 100
? Error from routine: set_thermodynamic
? Error message: A total of 1755 points had negative mass in set_thermodynamic. This indicates the pressure fields are inconsistent between different levels and the model is about to fail.
? Error from processor: 187
? Error number: 11
????????????????????????????????????????????????????????????????????????????????

Is there any other log/err file where I should look?
thanks,
dani

comment:4 Changed 13 months ago by xd904476

Hi GRenville, following your advise in ticket http://cms.ncas.ac.uk/ticket/2794#comment:21 I followed the instructions on how to restart a suite but I get the same "negative thermodynamics" as Holly.
I have tried switching to false the l_clim parameter but I still got the same error.
Unfortunately at the end of Holly's thread there is a mention of this error not being solved. Has anything changed in the meantime?

Thanks,
Dani

comment:5 Changed 13 months ago by grenville

Dani

The MO known UM failure pages say:

Points with negative mass in r2_set_thermodynamic
Why?: r2_set_thermodynamic sets up the thermodynamic fields into appropriate columns for the radiation scheme to act on. At one point it uses hydrostatic balance to make an estimate of the grid-box mass, but this can be negative if the input fields are seriously corrupted. This is a catch-all failure point for bad inputs and is not a problem with the radiation scheme.

Can you use a different set of start files (maybe your original set?)

Grenville

comment:6 Changed 13 months ago by xd904476

Hi Grenville,
this error is coming up using the original set of start dumps. I changed the ASTART to the perturbed one again because in a comment there was this suggestion.
In any case I am setting up a suite right now with a starting point in 2053, which is when I once restarted suite u-be699.
Thanks

comment:7 Changed 12 months ago by xd904476

Hi, good news: starting the suite in 2053 works!
I have now manually deleted all the folders in dtn02 and on archer for suite u-bj799 and I am restarting it again with 2014 forcing. I have also copied the startdumps again just in case they were corrupted.
Fingers crossed,
Dani

comment:8 Changed 12 months ago by xd904476

  • Resolution set to fixed
  • Status changed from new to closed

Hi, for unknown reason no suite works with 2014 forcings, but I restarted it in 2015 with startdumps taken from the u-be699 simulations and they work.

Thanks
dani

Note: See TracTickets for help on using tickets.