Opened 11 years ago

Closed 11 years ago

#310 closed help (fixed)

HECToR jobs memory fault

Reported by: sobaux Owned by: jeff
Component: UM Model Keywords:
Cc: w.mcginty@… Platform:
UM Version: 6.1

Description

Hi,

A number of jobs on hector recently have crashed with similar errors like

'/var/spool/PBS/mom_priv/jobs/345405.sdb.SC[324]: .: line 165: 7458: Memory fault
updscripts: Failed to run mkobjxref
updscripts: Output from mkobjxref:'

If the job is resubmitted it then runs fine.

Full output from the most recent is in

~sobaux/um/umui_out/xcyfy006.xcyfy.d09221.t225724.leave

Thanks,

Ian

Change History (4)

comment:1 Changed 11 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Ian,

I note that you are running from existing executables. It may be an idea to recompile everything. HECToR has gone quad core recently and has had some problems too. There are a few pages that need to be visited and closed - do a check setup to find these and clear these errors.

Let me know how you get on.

Regards,

Willie

comment:2 Changed 11 years ago by jeff

  • Cc w.mcginty@… added
  • Owner changed from willie to jeff
  • Status changed from accepted to assigned

Hi Ian

This is the problem that first happened when they changed the korn shell on hector, it seemed to go away when they upgraded ksh to a later version. Obviously it still seems to happen occasionally. The basic problem is that ksh is rubbish, but hector won't change versions and changing shells in the UM would be a big job. I can make this error go away because running mkobjxref is not needed unless you are installing the UM, hopefully the error won't occur elsewhere. I'll let you know when I have changed the PUM script mod to do this.

Jeff.

comment:3 Changed 11 years ago by jeff

Hi Ian

I've updated pum_full_6.1.mu to fix this problem, new jobs will pick these changes up. Let me know if it doesn't work or you get similar problems elsewhere.

Jeff.

comment:4 Changed 11 years ago by jeff

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.