Opened 10 years ago

Closed 10 years ago

#631 closed help (fixed)

OOM Killer (?)

Reported by: oma Owned by: um_support
Component: UM Model Keywords: OOM killer
Cc: Platform:
UM Version: 7.3



I'm having problems running a modified version of the UM. It is the result of merging two branches that work fine when separate. I know one of them is likely to fall into Out of Memory errors (OOM killer) but these have been solved (when running alone). After merging the branches I'm receiving the following error message in the .leave file

UM Executable : /work/n02/n02/oma/xgbib/bin/

[NID 00883] 2011-06-02 13:42:45 Apid 659832: initiated application termination
[NID 00883] 2011-06-02 13:42:52 Apid 659832: OOM killer terminated this process.
[NID 00882] 2011-06-02 13:45:44 Apid 659832: OOM killer terminated this process.
diff: /work/n02/n02/oma/tmp/tmp.hector-xe6-13.14731/xgbib.xhist: No such file or directory
qsexecute: Copying /work/n02/n02/oma/xgbib/xgbib.thist to backup thist file /work/n02/n02/oma/xgbib/xgbib.thist_keep
xgbib: Run failed

Both branches do require extra memory to work. So I was wondering whether it's somehow possible to increase the memory allocation in HECToR. Is there any other solution to this kind of problem?

Thanks in advance,


Change History (4)

comment:1 Changed 10 years ago by willie

Hi Oscar,

You could try running with 12 cores per node: this is done on the job resources and resubmission page. Currently you have 32GB divided by 24 cores or 1.3GB per core; switching to 12 would double this.



comment:2 Changed 10 years ago by oma

Hi Willie,

I will try that.



comment:3 Changed 10 years ago by oma

It did work!

comment:4 Changed 10 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.