Opened 2 years ago
Closed 2 years ago
#2720 closed help (fixed)
Error suite u-bd704 on second time step
Reported by: | xd904476 | Owned by: | um_support |
---|---|---|---|
Component: | Coupled model | Keywords: | coupled |
Cc: | Platform: | ARCHER | |
UM Version: | 10.7 |
Description
Hi, I am running suite u-bd704 and it fails at te second time step.
The suite has created a History_data directory on Archer, but then it fails with the following messages.
What could be the problem?
Thanks,
Dani
std_err
Rank 198 [Tue Jan 8 17:31:16 2019] [c4-0c1s4n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 198
_pmiu_daemon(SIGCHLD): [NID 00850] [c4-0c1s4n2] [Tue Jan 8 17:31:16 2019] PE RANK 198 exit signal Aborted
[NID 00850] 2019-01-08 17:31:17 Apid 33158802: initiated application termination
[FAIL] run_model # return-code=137
Received signal ERR
cylc (scheduler - 2019-01-08T17:31:42Z): CRITICAL Task job script received signal ERR at 2019-01-08T17:31:42Z
cylc (scheduler - 2019-01-08T17:31:42Z): CRITICAL failed at 2019-01-08T17:31:42Z
std_out
icebergs, read_restart_bergs: # bergs = 0 on PE 42
ice: Error writing time variable
Application 33158802 exit codes: 134
Application 33158802 exit signals: Killed
Application 33158802 resources: utime ~575s, stime ~8
Change History (4)
comment:1 Changed 2 years ago by willie
- Component changed from UM Model to Coupled model
- Keywords coupled added
- Platform set to ARCHER
- UM Version set to 10.7
comment:2 Changed 2 years ago by xd904476
Hi Willie, I did try this, but I get an error about unretrievable libraries: it seems to occur when dealing with icebergs, but I don't know how to fix it.
This is the stderr output I get:
Current format: 200 FORMAT(a19,10(a18,"=",es14.7,x,a2,:,","))
lib-4029 : UNRECOVERABLE library error
An underlying C library read or write request failed.
lib-4029 : UNRECOVERABLE library error
An underlying C library read or write request failed.
lib-4029 : UNRECOVERABLE library error
An underlying C library read or write request failed.
lib-4029 : UNRECOVERABLE library error
An underlying C library read or write request failed.
lib-4029 : UNRECOVERABLE library error
An underlying C library read or write request failed.
Encountered during a sequential formatted WRITE to
Encountered during a sequential formatted WRITE to
Encountered during a sequential formatted WRITE to
Encountered during a sequential formatted WRITE to unit 18
unit 18
Encountered during a sequential formatted WRITE to unit 18
Fortran unit 18 is
Encountered during a sequential formatted WRITE toconnected to unit 18
a sequential formatted text fileFortran unit 18 is :
"icebergs.stat_0020"
connected to Current format: a sequential formatted text file 200 FORMAT:
"icebergs.stat_0012"
(a19,10(a18,"=",es14.7,x,a2,:,","))
Current format: 200 FORMAT (a19,10(a18,"=",es14.7,x,a2,:,","))
Fortran unit 18 is
Encountered during a sequential formatted WRITE toFortran unit 18 is unit 18
unit 18
connected to unit 18
connected to Fortran unit 18 is a sequential formatted text file:
"icebergs.stat_0032"
a sequential formatted text file Current format: :
"icebergs.stat_0045"
200 FORMAT(a19,10(a18,"=",es14.7,x,a2,:,","))
Current format: 200 FORMAT Fortran unit 18 is (a19,10(a18,"=",es14.7,x,a2,:,","))
connected to a sequential formatted text file :
"icebergs.stat_0031"
Current format: 200 FORMAT (a19,10(a18,"=",es14.7,x,a2,:,","))
Fortran unit 18 is connected to a sequential formatted text file:
"icebergs.stat_0030"
Current format: 200 FORMAT(a19,10(a18,"=",es14.7,x,a2,:,","))
connected to a sequential formatted text file :
"icebergs.stat_0046"
Current format: 200 FORMAT (a19,10(a18,"=",es14.7,x,a2,:,","))
lib-4029 : UNRECOVERABLE library error
An underlying C library read or write request failed.
Encountered during a sequential formatted WRITE to unit 18
Fortran unit 18 is connected to a sequential formatted text file:
"icebergs.stat_0023"
Current format: 200 FORMAT(a19,10(a18,"=",es14.7,x,a2,:,","))
_pmiu_daemon(SIGCHLD): [NID 00100] [c0-0c1s9n0] [Thu Jan 10 00:59:00 2019] PE RANK 247 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 00099] [c0-0c1s8n3] [Thu Jan 10 00:59:00 2019] PE RANK 228 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 00097] [c0-0c1s8n1] [Thu Jan 10 00:59:00 2019] PE RANK 210 exit signal Aborted
[NID 00097] 2019-01-10 00:59:00 Apid 33165286: initiated application termination
[FAIL] run_model # return-code=137
Received signal ERR
cylc (scheduler - 2019-01-10T00:59:07Z): CRITICAL Task job script received signal ERR at 2019-01-10T00:59:07Z
cylc (scheduler - 2019-01-10T00:59:07Z): CRITICAL failed at 2019-01-10T00:59:07Z
Thanks,
Dani
comment:3 Changed 2 years ago by willie
Hi Dani,
In
./18500201T0000Z/coupled/01/job.err
You are getting
BUFFOUT: Write Failed: Disk quota exceeded
So you have run out of quota on ARCHER. Try removing previous runs that you no longer require. You can copy data to the RDF if necessary. Then try again.
Willie
comment:4 Changed 2 years ago by willie
- Resolution set to fixed
- Status changed from new to closed
Hi Dani,
It has just completed the second cycle time for me, so try doing a fresh start
Willie