Opened 14 months ago
Closed 13 months ago
#3086 closed help (answered)
suite failing at the "coupled" stage
Reported by: | yb19052 | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | NEXCS | |
UM Version: | 10.7 |
Description
I have a suite (u-bo840) that I am trying to run. But, the suite has failed at the "coupled" stage and got the following message in "job.err":
Rank 1014 [Sun Nov 24 03:50:19 2019] [c9-2c1s14n1] application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1014
Application 88555350 is crashing. ATP analysis proceeding…
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
_pmiu_daemon(SIGCHLD): [NID 07225] [c9-2c1s14n1] [Sun Nov 24 03:55:21 2019] PE RANK 1014 exit signal Aborted
[NID 07225] 2019-11-24 03:55:21 Apid 88555350: initiated application termination
[FAIL] run_model # return-code=137
2019-11-24T03:55:24Z CRITICAL - failed/EXIT
Although I checked the "ocean.output" file, it does not show any error messages.
I also checked some output from "coupled"(~/cylc-run/u-bo840/work/18500101T0000Z/coupled), and there are several wrong netCDF (e.g. ……error.nc).
Do you know how I fix this issue?
Thanks
Kenji
Change History (4)
comment:1 follow-up: ↓ 2 Changed 14 months ago by dcase
comment:2 in reply to: ↑ 1 Changed 14 months ago by yb19052
Replying to dcase:
I'm not sure if your cice start file exists. You have /projects/ukesm/jwalton/startdumps/cice/u-aj572i.restart.2000-01-01-00000.nc so I would check this.
Hi,
I appreciate your response. I checked the folder, and the cice start file does not exist now. I will look for other start file.
Thanks
Kenji
comment:3 Changed 14 months ago by dcase
It's possible that the permissions restrict your access too now, so perhaps consider whether you should be in the ukesm group, or talk to colleagues who are.
In any case, good luck getting the data that you need.
comment:4 Changed 13 months ago by ros
- Resolution set to answered
- Status changed from new to closed
I'm not sure if your cice start file exists. You have /projects/ukesm/jwalton/startdumps/cice/u-aj572i.restart.2000-01-01-00000.nc so I would check this.