#1721 closed help (fixed)

Model hangs running but with no output

Reported by: michmcr Owned by: willie
Priority: normal Component: UM Model
Keywords: ancillary file creation Cc:
Platform: ARCHER UM Version: 8.4

Description

I have attempted to run my model xlell, which is a direct copy from xlelf except with a different SST ancillary file. It will run on ARCHER and output the first month of data but nothing thereafter but will hang on in the queue rather than crash. The only details I am receiving in the .leave file are saying about MPI_abort of a number of different processes e.g.:

Rank 35 [Wed Nov 4 13:01:37 2015] [c2-2c2s4n0] application called MPI_Abort(comm=0x84000006, 9) - process 34

I thought initially that this might be an intermittent problem so attempted it again to no avail. I then thought it might be a problem with the ancillary file or how I created it on xancil so I took an old netcdf sst file, put it through xancil and set the model to run, and this worked fine. I have also looked at the headers of both files and don't obviously see any problems. Any help on this would be greatly appreciated.

Thanks
Michelle

Change History (3)

comment:1 Changed 18 months ago by willie

  • Keywords ancillary file creation added
  • Owner changed from um_support to willie
  • Platform set to ARCHER
  • Status changed from new to accepted

Hi Michelle,

The sst file /work/n02/n02/michmcr/data/tropic_sstx2_0312.anc has NANs in it - it is corrupt. You can see this by looking in xconv > view data on the first entry. Alternatively, you can cumf the file with itself. Obviously the file is identical with itself so there should be a perfect match, but when there are NaNs? these cause failures.

The file should be recreated.

Regards,

Willie

comment:2 Changed 18 months ago by michmcr

Hi Willie,

Thank you for your help. I have rectified this and the model is now running properly and I am getting output files.

Many thanks
Michelle

comment:3 Changed 18 months ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.