Opened 10 years ago

Closed 8 years ago

#483 closed help (fixed)

A run on the NAE domain with 76 levels

Reported by: oma Owned by: lois
Component: UM Model Keywords: NAE
Cc: Platform:
UM Version: 7.1

Description

Dear All,

I'm trying to run a simulation using vn7.1 on the NAE domain. The main difference with the standard job is that I want to run the model with 76 vertical levels. The global and standard 38 level NAE simulations have run without problem. However, the new setup produced the following error:

[0] MPICH PtlEQPoll error (PTL_EQ_DROPPED): An event was dropped on the OTHER EQ handle.  
Try increasing the value of env var MPICH_PTL_OTHER_EVENTS (cur size is 2048).
aborting job:
PtlEQPoll/PtlEQGet error
[NID 15643] 2010-08-24 17:21:35 Apid 2156714: initiated application termination
diff: /work/n02/n02/oma/tmp/tmp.nid00004.28187/xffzd.xhist: No such file or directory
qsexecute: Copying /work/n02/n02/oma/xffzd/xffzd.thist to backup thist file /work/n02/n02/oma/xffzd/xffzd.thist_keep
xffzd: Run failed

Would you have an idea on what I'm doing wrong?

Thanks in advance,

Oscar

Change History (3)

comment:1 Changed 10 years ago by lois

  • Owner changed from um_support to lois
  • Status changed from new to assigned

Hello Oscar,

it is not something you are doing wrong you have just hit a system limit. There is an explanation here

http://ncas-cms.nerc.ac.uk/index.php/hpc-faqs/1412-hector-message-passing-environment-variables

So you will need to set the environment variable MPICH_PTL_OTHER_EVENTS with a larger value in the UMUI and find a value suitable for your problem.

Let us know if there are still problems.

Lois

comment:2 Changed 10 years ago by oma

Dear Lois,

Problem successfully solved.

Thanks a lot

Oscar

comment:3 Changed 8 years ago by ros

  • Keywords NAE added
  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.