Opened 3 weeks ago

Last modified 3 weeks ago

#3086 new help

suite failing at the "coupled" stage

Reported by: yb19052 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: NEXCS
UM Version: 10.7

Description

I have a suite (u-bo840) that I am trying to run. But, the suite has failed at the "coupled" stage and got the following message in "job.err":

Rank 1014 [Sun Nov 24 03:50:19 2019] [c9-2c1s14n1] application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1014
Application 88555350 is crashing. ATP analysis proceeding…
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
_pmiu_daemon(SIGCHLD): [NID 07225] [c9-2c1s14n1] [Sun Nov 24 03:55:21 2019] PE RANK 1014 exit signal Aborted
[NID 07225] 2019-11-24 03:55:21 Apid 88555350: initiated application termination
[FAIL] run_model # return-code=137
2019-11-24T03:55:24Z CRITICAL - failed/EXIT

Although I checked the "ocean.output" file, it does not show any error messages.

I also checked some output from "coupled"(~/cylc-run/u-bo840/work/18500101T0000Z/coupled), and there are several wrong netCDF (e.g. ……error.nc).

Do you know how I fix this issue?

Thanks
Kenji


Change History (3)

comment:1 follow-up: Changed 3 weeks ago by dcase

I'm not sure if your cice start file exists. You have /projects/ukesm/jwalton/startdumps/cice/u-aj572i.restart.2000-01-01-00000.nc so I would check this.

comment:2 in reply to: ↑ 1 Changed 3 weeks ago by yb19052

Replying to dcase:

I'm not sure if your cice start file exists. You have /projects/ukesm/jwalton/startdumps/cice/u-aj572i.restart.2000-01-01-00000.nc so I would check this.

Hi,
I appreciate your response. I checked the folder, and the cice start file does not exist now. I will look for other start file.
Thanks
Kenji

comment:3 Changed 3 weeks ago by dcase

It's possible that the permissions restrict your access too now, so perhaps consider whether you should be in the ukesm group, or talk to colleagues who are.
In any case, good luck getting the data that you need.

Note: See TracTickets for help on using tickets.