Opened 9 years ago

Closed 9 years ago

#606 closed error (fixed)

Error in exit processing after model run... Failed in model executable

Reported by: a.elvidge Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version: 7.6

Description

Hi,

I am working on hector on job xfxkd. It is a 1.5km LAM running off a start dump and LBCs created by my 4km job xfxkb (which runs fine now, thanks to your help!). I am running the same 1.5km job successfully on MONSooN.
The job fails with the following error message:

execve error: Exec format error
[NID 00536] 2011-04-12 21:11:55 Apid 2543253: cannot execute: exit(107) exec failed
xfxkd: Run failed
*****************************************************************
   Ending script   :   qsexecute
   Completion code :   1
   Completion time :   Tue Apr 12 21:10:21 BST 2011
*****************************************************************

/work/n02/n02/hum/vn7.6/pathscale_quad/scripts/qsmaster: Failed in qsexecute in model xfxkd
*****************************************************************
   Starting script :   qsfinal
   Starting time   :   Tue Apr 12 21:10:21 BST 2011
*****************************************************************

/work/n02/n02/hum/vn7.6/pathscale_quad/scripts/qsfinal: Error in exit processing after model run
Failed in model executable

/work/n02/n02/hum/vn7.6/pathscale_quad/scripts/qsfinal: Model xfxkd - Error: No history files
*****************************************************************
   Ending script   :   qsfinal
   Completion code :   135
   Completion time :   Tue Apr 12 21:10:21 BST 2011
*****************************************************************

/work/n02/n02/hum/vn7.6/pathscale_quad/scripts/qsmaster: failed in final in model xfxkd

I can't work out from the output what this means. Help much appreciated.

Cheers, Andy

Change History (10)

comment:1 Changed 9 years ago by ros

Hi Andy,

Where did the /work/n02/n02/aelvidge/OFCAP/um_build/xfmee.exec come from? Which job did you use to build it on HECToR?

Cheers,
Ros.

comment:2 Changed 9 years ago by a.elvidge

Hi Ros,

It is Stuart Webster's build. xfmee is a MONSooN job. Do I need a hector build for a hector job?

Cheers, Andy

comment:3 Changed 9 years ago by ros

Hi Andy,

Yes, you will need to copy Stuart's job and build the exec on HECToR. MONSooN and HECToR are different architectures and the executables therefore can't be copied between the machines.

Regards,
Ros.

comment:4 Changed 9 years ago by a.elvidge

Hi Ros,

Copying the job across to hector I am getting this error in the .comp.leave output:

  The maximum number, 100, of fatal errors has been exceeded.
fcm_internal compile failed (512)
gmake: *** [calc_div_ep_flux.o] Error 1
gmake: *** Waiting for unfinished jobs....
gmake -f /home/n02/n02/aelvidge/compile/xfxky/ummodel/Makefile -j 6 all failed (2) at /work/n02/n02/hum/fcm/bin/../lib/Fcm/Build.pm line 597
Build failed on Wed Apr 13 10:50:03 2011.
->Make: 2400 seconds
->TOTAL: 3512 seconds
ATM build failed

Do you know what this is?

Cheers, Andy

comment:5 Changed 9 years ago by a.elvidge

(job xfxky)

comment:6 Changed 9 years ago by ros

Hi Andy,

You need to include the FCM branch

fcm:um_br/pkg/Config/VN7.6_ncas

which includes all the standard code changes needed to run on HECToR.

Regards,
Ros.

comment:7 Changed 9 years ago by a.elvidge

Thanks, that job is now working.
But, back to the original job - xfxkd - I am getting a new error:

 *********************************************************************************
 UM ERROR (Model aborting) :
 Routine generating error: INITIAL
 Error code:  4
 Error message:
error opening shortwave spectral file. iostat=    2
 *********************************************************************************

Im not sure if this is the source of the problem, I notice one thing which could cause a problem in this job within the FCM Options for UM Atmos and Reconfig window…

The following is switched on:
'Use different version of teh UM code base from the default for this UMUI version'
this is set to vn 7.4

Also, 'Use precompiled build is set to on' (this job was copied across from a monsoon job), with the following defined
Local prebuild location: ~umbuild/vn7.4/prebuilds
Remote prebuild location: /projects/um1/vn7.4/ibm/prebuilds
Model name: lam_high_noreprod

Do you know how I can convert all this so its relevant to working in puma and hector?

Thanks, Andy

comment:8 Changed 9 years ago by ros

Hi Andy,

If you look further down your output file you will find further information about the error:

{{{ Namelist file: /projects/um1/vn7.6/ctldata/spectral/spec3a_sw_h4_meso2_ice12r

Failure in call to INITPHYS
*
UM ERROR (Model aborting) :
Routine generating error: INITIAL
Error code: 4
Error message:

error opening shortwave spectral file. iostat= 2

*

}}}

/projects/um1/vn7.6/ctldata/spectral is a MONSooN path. Correcting the setting of the UM_SPECTRAL variable to /work/n02/n02/hum/vn7.6….. in window "input/output control and resources → Time convention and environment vars" will fix this error.

Regarding the settings in the FCM options window. It's irrelevant for this job as it is a run only job. However for completeness and less confusion, you should have "use different version of the UM code base" set to vn7.6. We don't have prebuilds on HECToR so just switch that off.

Regards,
Ros.

comment:9 Changed 9 years ago by a.elvidge

Thanks Ros,
That was another careless mistake. The job is working now, in fact, all my jobs are working now! Thanks very much.
Cheers, Andy

comment:10 Changed 9 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.