#3038 closed help (fixed)

atmos_main failing with no clear reason

Reported by: taubry Owned by: um_support
Component: UM Model Keywords:
Cc: lrm49@…, jjas3@… Platform: ARCHER
UM Version: 11.2

Description

Dear NCAS team,

One of my run failed over the weekend. The suite id is u-bm840. I only made minor modifications since I last ran this suite successfully, and several cycle were run successfully before the suite fail. I thus suspect the problem is related to ARCHER. I checked that I have no disk space/MAUs quota problem.
The last to lines of my job.err file are the following:

[NID 00883] 2019-10-11 23:38:42 Apid 37381831: binary transfer gunzip failed, exit value 1
[FAIL] um-atmos # return-code=1

One of my officemate (Lauren Marshall) has the same problem with suite id u-bm515.
Another officemate (Johnny Staunton Sykes) has a very similar problem (on u-bm865) but the last 2 lines of the job.err show:
[NID 00883] 2019-10-13 19:00:10 Apid 37406754: cannot write placement info file
[FAIL] um-atmos # return-code=1
This seems similar to an ARCHER issue that was supposedly resolved Oct 8th.

For both of them, several cycles were run correctly before failing at atmos_main with these errors.

Thanks for any help!

Thomas

Change History (3)

comment:1 Changed 12 months ago by grenville

Thomas

This is an ARCHER problem (fixed as far as I know)

NID 00883] 2019-10-13 19:00:10 Apid 37406754: cannot write placement info file

but that's not the problem reported for your job.

Seems like you're running now.

Grenville

comment:2 Changed 12 months ago by taubry

Hi Grenville,

Yes we tried to retrigger our jobs this morning and they seem to be running (including those of Lauren and Johnny). Will just check that they don't fail again after a few cycles and close the ticket then.

Thanks,

Thomas

comment:3 Changed 12 months ago by taubry

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.