Opened 3 months ago

Closed 3 months ago

#3236 closed help (fixed)

ILAMB not working on JASMIN

Reported by: pmcguire Owned by: pmcguire
Component: JULES Keywords: ILAMB, JULES, JASMIN
Cc: Platform: JASMIN
UM Version:

Description

Hello CMS Helpdesk
ILAMB doesn't appear to be working on JASMIN anymore.
For example, the suite ~/pmcguire/roses/u-bb897GL5a_IF2 no longer works, and has these
errors in the job.err file:

ilamb-runilamb-run:: RRank 0:2ank 0:4:: MMPI_Init_threadPI_Init_thread:: ilamb-runUU:nable to open nable to open Ropt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldopt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldank 0:6 :- running with failsafe mpi collective algorithms.

  • running with failsafe mpi collective algorithms.

MPI_Init_thread: ilamb-runUnable to open :/ opt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldR ank 0:3- running with failsafe mpi collective algorithms.

: MPI_Init_thread: ilamb-runUnable to open :/ opt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldR ank 0:5- running with failsafe mpi collective algorithms.

: MPI_Init_thread: ilamb-runUnable to open :/ opt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldR ank 0:1- running with failsafe mpi collective algorithms.

: MPI_Init_thread: ilamb-runUnable to open :/ opt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldR ank 0:7- running with failsafe mpi collective algorithms.

: MPI_Init_thread: ilamb-runU:nable to open /Ropt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_worldank 0:0 :- running with failsafe mpi collective algorithms.

MPI_Init_thread: Unable to open /opt/platform_mpi/lib/linux_amd64/libcoll.so: undefined symbol: hpmp_comm_world - running with failsafe mpi collective algorithms.

Traceback (most recent call last):

File "/usr/bin/ilamb-run", line 555, in <module>

WorkPost(M,C,W,S,not args.quiet,args.skip_plots)

File "/usr/bin/ilamb-run", line 392, in WorkPost

S.createHtml(M)

File "/usr/lib/python2.7/site-packages/ILAMB/Scoreboard.py", line 274, in createHtml

rel_tree = GenerateRelationshipTree(self,M)

File "/usr/lib/python2.7/site-packages/ILAMB/Scoreboard.py", line 660, in GenerateRelationshipTree

h2 = Node(data.confrontation.longname)

AttributeError: 'NoneType' object has no attribute 'longname'

User defined signal 2

MPI Application rank 3 killed before MPI_Finalize() with signal 12

mpirun: propagating signal 12

2020-03-27T11:52:58Z CRITICAL - failed/SIGUSR2

This suite is identical to a previously-run suite ~/pmcguire/roses/u-bb897GL5a_IF, which had run successfully weeks ago. The suite is a descendent of the MOSRS suite u-bb897.

Do you know what the problem is?
Patrick McGuire

Change History (3)

comment:1 Changed 3 months ago by pmcguire

The location of the jules group workspace has changed recently on JASMIN.
In your rose-suite.conf file, you will need to change:

 ILAMB_ROOT='/group_workspaces/jasmin2/jules/ILAMB/'

to

 ILAMB_ROOT='/gws/nopw/j04/jules/ILAMB/'

And you might need to make a similar change in
app/run_ilamb/file/run_ilamb.sh, if ILAMB_ROOT is defined there.
Patrick

comment:2 Changed 3 months ago by pmcguire

  • Status changed from new to assigned

comment:3 Changed 3 months ago by pmcguire

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.