Opened 9 years ago

Closed 9 years ago

#631 closed help (fixed)

OOM Killer (?)

Reported by: oma Owned by: um_support
Component: UM Model Keywords: OOM killer
Cc: Platform:
UM Version: 7.3

Description

Hello,

I'm having problems running a modified version of the UM. It is the result of merging two branches that work fine when separate. I know one of them is likely to fall into Out of Memory errors (OOM killer) but these have been solved (when running alone). After merging the branches I'm receiving the following error message in the .leave file

*********************************************************
UM Executable : /work/n02/n02/oma/xgbib/bin/qxum.my
*********************************************************


[NID 00883] 2011-06-02 13:42:45 Apid 659832: initiated application termination
[NID 00883] 2011-06-02 13:42:52 Apid 659832: OOM killer terminated this process.
[NID 00882] 2011-06-02 13:45:44 Apid 659832: OOM killer terminated this process.
diff: /work/n02/n02/oma/tmp/tmp.hector-xe6-13.14731/xgbib.xhist: No such file or directory
qsexecute: Copying /work/n02/n02/oma/xgbib/xgbib.thist to backup thist file /work/n02/n02/oma/xgbib/xgbib.thist_keep
xgbib: Run failed
*****************************************************************

Both branches do require extra memory to work. So I was wondering whether it's somehow possible to increase the memory allocation in HECToR. Is there any other solution to this kind of problem?

Thanks in advance,

Oscar

Change History (4)

comment:1 Changed 9 years ago by willie

Hi Oscar,

You could try running with 12 cores per node: this is done on the job resources and resubmission page. Currently you have 32GB divided by 24 cores or 1.3GB per core; switching to 12 would double this.

Regards,

Willie

comment:2 Changed 9 years ago by oma

Hi Willie,

I will try that.

Thanks,

Oscar

comment:3 Changed 9 years ago by oma

It did work!

comment:4 Changed 9 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.