Opened 9 years ago

Closed 7 years ago

#556 closed help (fixed)

Problem with reconfiguration on Phase2b

Reported by: pclark Owned by: lois
Component: UM Model Keywords:
Cc: kerstin Platform: <select platform>
UM Version: <select version>

Description

I have rebuild reconfig and UM execs (I believe) using job xflva. I am now trying to run a new reconfiguration job xflvj. It dies with lots of messages like:

user-kernel cdev open failed: Permission denied
0: (portals/_pmi_portals.c:452) PtlNIInit failed : PTL_NOT_REGISTERED(36)
[PE_0]: PtlNIInit failed
[PE_0]: _pmii_portals_start failed
[PE_0]: _pmi_preinit: _pmii_inet_setup returned -1
user-kernel cdev open failed: Permission denied
1: (portals/_pmi_portals.c:452) PtlNIInit failed : PTL_NOT_REGISTERED(36)
[PE_1]: PtlNIInit failed
[PE_1]: _pmii_portals_start failed
[PE_1]: _pmi_preinit: _pmii_inet_setup returned -1
user-kernel cdev open failed: Permission denied
2: (portals/_pmi_portals.c:452) PtlNIInit failed : PTL_NOT_REGISTERED(36)
user-kernel cdev open failed: Permission denied
user-kernel cdev open failed: Permission denied
user-kernel cdev open failed: Permission denied
user-kernel cdev open failed: Permission denied

These mean nothing to me - is this a phase2b issue? Is phase2b safe to use yet?

TIA

Change History (9)

comment:1 Changed 9 years ago by pclark

Version 7.4, BTW.

comment:2 follow-up: Changed 9 years ago by lois

  • Owner changed from um_support to lois
  • Status changed from new to assigned

I would leave phase 2b alone until after Christmas Peter. It has only just returned (on Friday 17th) from its upgrade from an XT6 to an XE6, the same chip but with the new Gemini interconnect. We have done some preliminary tests on an XE6 test system and we need a whole new set of MPI environment variables for the UM. So some serious testing and tuning is needed. However most of the CMS team are on holiday over the next 2 weeks and the University of Reading will be closed from the 24th December to the 4th Januray so we decided to start the XE6 work when we return in the new year. In the mean time I hope that phase2a (the XT4) is still able to serve our community.

Let us know if you are having problems on phase 2a.

Thanks
Lois

comment:3 in reply to: ↑ 2 Changed 9 years ago by pclark

Replying to lois:

I would leave phase 2b alone until after Christmas Peter. It has only just returned (on Friday 17th) from its upgrade from an XT6 to an XE6, the same chip but with the new Gemini interconnect. We have done some preliminary tests on an XE6 test system and we need a whole new set of MPI environment variables for the UM. So some serious testing and tuning is needed. However most of the CMS team are on holiday over the next 2 weeks and the University of Reading will be closed from the 24th December to the 4th Januray so we decided to start the XE6 work when we return in the new year. In the mean time I hope that phase2a (the XT4) is still able to serve our community.

Will do - I suggest you close this. I'll play with 2a for now.

Let us know if you are having problems on phase 2a.

Thanks
Lois

happy Christmas to you and the team!!

comment:4 follow-up: Changed 9 years ago by lois

  • Resolution set to fixed
  • Status changed from assigned to closed

Instructions for running the Um on pahse 2b are awiating a final decision as to what environment variables to use.

comment:5 in reply to: ↑ 4 Changed 9 years ago by kerstin

Hi Lois,
it seems like I am having a similar problem like Peter with phase2b:

[PE_0]: inet_ipaddr_from_dev: ioctl SIOCGIFADDR call failed 19
[PE_0]: WARNING: inet_listen_socket_setup using wildcard bind IP addr
user-kernel cdev open failed: Permission denied

Has the solution already been found to this problem? Or should I move back to phase2a?

Many thanks!
Kerstin

comment:6 Changed 9 years ago by lois

  • Cc kerstin added
  • Resolution fixed deleted
  • Status changed from closed to reopened

I am not sure what this problem is Kerstin, it looks like a system problem. A few people have been experiencing write problems even from phase2a. While we are invesitgating I suggest that you move back to phase 2a.

We are working on the performance issues of all UM versions on phase 2b but 7.1 was not high on our list. Is this a version you are likely to want to continue to use for some time, in which case we will change our priorites?

Lois

comment:7 Changed 9 years ago by kerstin

Thanks Lois for coming back to this. I want to continue using UM Version 7.1, so I would appreciate if it would be possible to increase the priority.

Many thanks! Kerstin

comment:8 Changed 8 years ago by a.elvidge

Hi,
Was a solution found for this problem? I am getting the same error as reported by the original poster. This is for a job which I have run successfully before, and have changed very little since.
Thanks, Andy

comment:9 Changed 7 years ago by ros

  • Platform set to <select platform>
  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.