Opened 5 months ago

Closed 4 months ago

#2845 closed help (answered)

job failed in atmos_main

Reported by: zliu Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 10.7

Description

Hi,

I am running suite u-bf152 with UM version 10.7 on Archer. Some outputs are produced under (/home/n02/n02/zliu/cylc-run/u-bf152/share/data/History_Data, e.g. nudg_asia_memb01a.pd1981sep). But the ouput cannot be opened use xconv.
The atmos_main program failed with the error
[FAIL] um-atmos # return-code=137
Received signal ERR
cylc (scheduler - 2019-04-01T08:54:48Z): CRITICAL Task job script received signal ERR at 2019-04-01T
Actually, I ran this suite successfully before. What I changed are the variables in STASH and run length.
Could you please help me with this problem. Thanks.
zhen

Attachments (1)

log (20.7 KB) - added by zliu 5 months ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 5 months ago by willie

  • Platform set to ARCHER

Hi Zhen,

Xconv can read the file, it's just that there is no data in it. More worrying is that the job.err and job.out for the atmos app are missing. Where are you getting the error message from? 137 translates to 'kill' so it looks like the job was killed it before the model could produce output.

Did you run the job for long enough that monthly means could be computed i.e. greater than one month.

Willie

comment:2 Changed 5 months ago by zliu

Hi Willie,

Morning! Great to receive your replies.
I didn't kill it. I thought that the daily output with 'pd' should have data even I run shorter than a month. Moreover, I changed the simulation time for more than one month, but atmos_main still failed.
The error file is under /home/n02/n02/zliu/cylc-run/u-bf152/log/job/19810901T0000Z/atmos_main/01. You can grep ERR to find the problem. Thanks.

zhen

comment:3 Changed 5 months ago by zliu

Hi Willie,

I realize that in namelist→Model Input and Output, the mean_reftimeim is 01/12/1981. My simulation start date is 01/09/1981. Does this mean the model won't produce any output until 01/12/1981. Thanks.

zhen

comment:4 Changed 5 months ago by zliu

Hi Willie,

I change the mean_reftimeim to 01/09/1981, run time length to more than one month. But I still get the same error. The job.out is under /home/n02/n02/zliu/work/cylc-run/u-bf152/log/job/19810901T0000Z/atmos_main/01
Thanks.
zhen

comment:5 Changed 5 months ago by willie

Hi Zhen,

In the job.err for the atmos task it says that there has been a segmentation fault - this is a major error which has caused the job to terminate. Which STASH items did you modify?

Willie

Changed 5 months ago by zliu

comment:6 Changed 5 months ago by zliu

Hi Willie,

I include a lot of variables for output. The suite u-bf152 I am running is copied from u-av526. I look at the differences between and write in one log file attached. From line 84 to line 1016, I make a lot of changes to STASH items.
If the segmentation fault is induced by STASH item, I am wondering if there are some ways to diagnostic which item is the cause. Thanks.

zhen

comment:7 Changed 5 months ago by willie

Hi Zhen,

Switch off all of the new STASH and see if it runs. If it doesn't then it's not STASH and you should look elsewhere for the error. If it does, then reintroduce the STASH one at a time until the error re-appears.

It is better to make small changes, test and then, if it works, commit your suite to the repository.

Willie

comment:8 Changed 4 months ago by willie

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.