Opened 5 years ago

Closed 5 years ago

#1284 closed error (fixed)

Run xjvfa crash

Reported by: swsdong Owned by: willie
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 7.8


Dear Sir/Madam?

I am running HadGEM3-A N96 experiment (xjvfa). The model is able to run for 1 month, but then it crashes with the following error.

UM ERROR (Model aborting) :
Routine generating error: coex2
Error code: 2
Error message:

COEX: Unable to WGDOS pack to this accuracy

Rank 0 [Fri Apr 25 19:19:48 2014] [c7-1c0s8n2] application called MPI_Abort(comm=0x84000004, 9) - process 0
_pmiu_daemon(SIGCHLD): [NID 02914] [c7-1c0s8n2] [Fri Apr 25 19:19:48 2014] PE RANK 0 exit signal Aborted
[NID 02914] 2014-04-25 19:19:48 Apid 8006925: initiated application termination
qsexecute: Copying /work/n02/n02/bdong/um/xjvfa/xjvfa.thist to backup thist file /work/n02/n02/bdong/um/xjvfa/xjvfa.thist_keep
xjvfa: Run failed

Meanwhile, the climatlogical monthly mean output xjvfaa.pmi6jun in /work/n02/n02/bdong/um/xjvfa can not be read using xconv. Can you advise me what might have caused this crash? Thanks.



Change History (5)

comment:1 Changed 5 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

HI Buwen,

I suspect these are related. The presence of NaNs? in the monthly mean output prevents the compression from working. You can check for NaNs? by cumf'ing the file with itself. For normal files the comparison should result in no difference; for files with NaNs? the difference is recorded in the summary file.

I can't read your files at the moment, so for future help, could you please

   chmod -R g+rX /home/n02/n02/bdong
   chmod -R g+rX /work/n02/n02/bdong



comment:2 Changed 5 years ago by swsdong

Hi, Willie

I have modified access permission as you suggested. Please have a look for those output and help me to identify the problem. Thank you very much.



comment:3 Changed 5 years ago by willie

Hi Buwen,

The file xjvfaa.pmi6jun is incomplete: the problem happened earlier. You can see this with

cumf -dOUT ~ xjvfaa.dai6710 xjvfaa.dai6710

The problem seems to be

Field  4588 : Stash Code  1241 : DROPLET NUMBER CONC * LYR CLOUD WGT  : Number of differences =     1560

Field  4589 : Stash Code  1242 : LAYER CLOUD LWC * LAYER CLOUD WEIGHT : Number of differences =     1560

The start dump is OK and the model is converging throughout, yet there are NaNs? and infinities in this file. Does that give any clues?



comment:4 Changed 5 years ago by swsdong

Hi, Willie

Thanks for your help. The model can run now when I chose "Unpacked, profile 0" in initialization and processing of standard PP files in stead of "New standard climate packing, profile 5".



comment:5 Changed 5 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.