Opened 4 months ago

Closed 8 weeks ago

#3409 closed help (answered)

NEMO timeout problem

Reported by: smisios Owned by: um_support
Component: UM Model Keywords: NEMO OASIS
Cc: Platform: ARCHER
UM Version:

Description

Hi Grenville,

I trying to help a user (Dr Stergios Misios, Oxford) who finds that is coupled
NEMO/OASIS jobs have started timing out within the atmosphere code.

It appears that the atmosphere code is creating output files but not writing
any data to them. This looks like an issue between OASIS and XIOS: example error
codes are IOS_QUERYBUFFER and IOS_CONSUME_CLIENT_MEM.

There are no problems with the writing of ocean and ICE files however.
The user is within his n02 file quota.

Do you know of any reasons why OASIS could be prevented from writing output?
I had a brief search on the UM Trac, but couldn't find anything.

If this was an ARCHER issue, I would expect the ocean/ICE files to be affected also.

Regards,

Michael
The ARCHER Service Desk Team

Change History (4)

comment:1 Changed 4 months ago by grenville

Hi Stergios

ARCHER forwarded this query. What is the id of the failing suite?

Grenville

comment:2 Changed 4 months ago by grenville

Dear Grenville,
i am struggling with this error and have no idea what is about. It is strange because the model was running for couple of years (+20) and then crashes because of memory issues.
The guys from Archer are looking into it but have not been very useful yet.
All my jobs are crashing an example is the u-bx142 which is a copy of the DAMIP-NAT.

Have you any idea what is going on ?

Best wishes,
Stergios

comment:3 Changed 4 months ago by grenville

Stergois

Several users have reported very slow running models - the problem with u-bx142 appears to be intermittent - it occurred for u-bz110 cylce 18500101T0000Z, but the model recovered. It looks like you have been affected by some IO issue on the machine.

The problem with u-bz110 is ultimately a disc quota issue; see/home/n02/n02/smisios/cylc-run/u-bz110/log/job/18520101T0000Z/coupled/03/job.out:

filename='bz110o_1d_18520101_18521230_grid_T.nc' error='Disk quota exceeded

I have increased your /work quota - please try running again.

Please use the helpdesk - emails direct to me are apt to get forgotten

Grenville

comment:4 Changed 8 weeks ago by ros

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.