Opened 4 months ago
Closed 8 weeks ago
#3409 closed help (answered)
NEMO timeout problem
Reported by: | smisios | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | NEMO OASIS |
Cc: | Platform: | ARCHER | |
UM Version: |
Description
Hi Grenville,
I trying to help a user (Dr Stergios Misios, Oxford) who finds that is coupled
NEMO/OASIS jobs have started timing out within the atmosphere code.
It appears that the atmosphere code is creating output files but not writing
any data to them. This looks like an issue between OASIS and XIOS: example error
codes are IOS_QUERYBUFFER and IOS_CONSUME_CLIENT_MEM.
There are no problems with the writing of ocean and ICE files however.
The user is within his n02 file quota.
Do you know of any reasons why OASIS could be prevented from writing output?
I had a brief search on the UM Trac, but couldn't find anything.
If this was an ARCHER issue, I would expect the ocean/ICE files to be affected also.
Regards,
Michael
The ARCHER Service Desk Team
Change History (4)
comment:1 Changed 4 months ago by grenville
comment:2 Changed 4 months ago by grenville
Dear Grenville,
i am struggling with this error and have no idea what is about. It is strange because the model was running for couple of years (+20) and then crashes because of memory issues.
The guys from Archer are looking into it but have not been very useful yet.
All my jobs are crashing an example is the u-bx142 which is a copy of the DAMIP-NAT.
Have you any idea what is going on ?
Best wishes,
Stergios
comment:3 Changed 4 months ago by grenville
Stergois
Several users have reported very slow running models - the problem with u-bx142 appears to be intermittent - it occurred for u-bz110 cylce 18500101T0000Z, but the model recovered. It looks like you have been affected by some IO issue on the machine.
The problem with u-bz110 is ultimately a disc quota issue; see/home/n02/n02/smisios/cylc-run/u-bz110/log/job/18520101T0000Z/coupled/03/job.out:
filename='bz110o_1d_18520101_18521230_grid_T.nc' error='Disk quota exceeded
I have increased your /work quota - please try running again.
Please use the helpdesk - emails direct to me are apt to get forgotten
Grenville
comment:4 Changed 8 weeks ago by ros
- Resolution set to answered
- Status changed from new to closed
Hi Stergios
ARCHER forwarded this query. What is the id of the failing suite?
Grenville