Opened 8 years ago

Closed 8 years ago

#806 closed help (fixed)

problem reconfiguring with ecmwf startdump

Reported by: anmcr Owned by: willie
Component: UM Model Keywords: optimisation, reconfiguration
Cc: Platform:
UM Version: 6.1

Description

Hi Willie,

I'm trying to reconfigure a ECMWF startdump using vn6.1 of the UM at N320. The job id is xhasz. This is a copy of a job which worked previously. The job fails with the errors given below. I've exhausted all the ideas I have to try to get it to run, such as switching on/off the land-sea mask reconfiguration or the type of vertical interpolation. I've even copied the 'umui' sample job over and tried to run that, but it seems to fail with the same error.

Thanks for any help,

Andrew

STOP

/work/n02/n02/hum/vn6.1/cce/scripts/qssetup: Job terminated normally

/work/n02/n02/anmcr/tmp/tmp.hector-xe6-13.21078/modscr_xhasz/qsexecute: Executing dump reconfiguration program /work/n02/n02/anmcr/xhasz/bin/reconf.exe

_pmiu_daemon(SIGCHLD): [NID 01772] [c11-1c0s6n0] [Fri Mar 9 10:21:42 2012] PE RANK 26 exit signal Segmentation fault
[NID 01772] 2012-03-09 10:21:42 Apid 1762888: initiated application termination
/work/n02/n02/anmcr/tmp/tmp.hector-xe6-13.21078/modscr_xhasz/qsexecute: Error in dump reconfiguration - see OUTPUT
*

Ending script : qsexecute
Completion code : 139
Completion time : Fri Mar 9 10:21:45 UTC 2012

*

/work/n02/n02/anmcr/tmp/tmp.hector-xe6-13.21078/modscr_xhasz/qsmaster: Failed in qsexecute in model xhasz

<<<< Information about How Many Lines of Output follow >>>>
29 lines in main OUTPUT file.
0 lines of O/P from pe0.
<<<< Lines of Output Information ends >>>>

####### # # ####### ###### # # #######
# # # # # # # # # #
# # # # # # # # # #
# # # # # ###### # # #
# # # # # # # # #
# # # # # # # # #
####### ##### # # ##### #

updscripts: %UPDATES% output follows:-

PUMSCM: version 1.21 (2003/06/19). © Met Office

Completed with 0 error(s) and 0 warning(s).
qsexecute: %RECONA% Atmosphere reconfiguration step

aprun -n 32 -N 32 -S 8 -ss /work/n02/n02/anmcr/xhasz/bin/reconf.exe

=====================================================
GCOM Version 3.8
HECToR -DMPI
Using precision : 64bit INTEGERs and 64bit REALs
Built at Wed Oct 26 10:17:22 BST 2011
=====================================================

Parallel Reconfiguration using 32 processor(s)
divided into a LPG with nproc_x= 4 and nproc_y= 8

C I/O Error: failed in BUFFIN8
Return code = 1
C I/O Error: failed in BUFFIN8
Return code = 1
C I/O Error: failed in BUFFIN8
Return code = 1
Application 1762888 exit codes: 139
Application 1762888 resources: utime ~88s, stime ~2s


Change History (9)

comment:1 Changed 8 years ago by willie

  • Keywords number of land points added
  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Andrew,

You need to change the number of land points to 104538 and then try again.

Regards

Willie

comment:2 Changed 8 years ago by anmcr

Hi Willie,

The number of land points in xhasz is already set to 104538?

Best wishes,

Andrew

comment:3 Changed 8 years ago by willie

Hi Andrew,

Sorry, I repeated the standard job - see my xhaxb - and it worked. I changed the land points, flushed the output and reduced the optimisation to -O0. So it looks like the optimisation did the trick.

Regards,

Willie

comment:4 Changed 8 years ago by anmcr

Hi Willie,

I replaced my job xhasv with your job xhaxb, and it's still failing. If you do a difference between both jobs, then obviously the changes in xhasv are minor, such as directory names etc, and yet it still fails.

Andrew

comment:5 Changed 8 years ago by willie

Hi Andrew,

You need to change the path to the overrides file: it is currently in my $HOME/overrides, not yours.

Regards

Willie

comment:6 Changed 8 years ago by anmcr

Hi Willie,

I changed the path to the overrides file and I think its worked — at least a reconfigured startdump has been created.

I'm trying to run a vn7.1 N320 global model with it, however I can't get this model to compile. The job id is xhasa. This is a job which worked previously on phase2a.

Are you able to help?

Thanks,

Andrew

comment:7 Changed 8 years ago by willie

  • Keywords optimisation, reconfiguration added; number of land points removed

Hi Andrew,

In addition to the hardware changes for Phase2a we changed from the Pathscale to the Cray compiler for phase 3. There are some changes that need to be made to existing jobs. See http://cms.ncas.ac.uk/index.php/component/content/article/22/1583-hectorphase3.

Regards,

Willie

comment:8 Changed 8 years ago by anmcr

Hi Willie,

Thanks for the help. I got the job to run.

Best wishes,

Andrew

comment:9 Changed 8 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.