Opened 4 years ago

Closed 4 years ago

#1938 closed help (fixed)

POLARIS UKCA run

Reported by: earhg Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Other
UM Version: 8.4

Description

hi,
I am getting some strange looking errors when trying to run a copy of Grenville's job xidem on POLARIS, my job xmvnm, from PUMA for the first time. The .leave file is attached. They look a bit like they might be related to the POLARIS hand edit, but since I am sourcing Grenville's file I don't see how it could cause an issue. Many thanks for any ideas! Hamish

Attachments (1)

xmvnm000.xmvnm.d16221.t150310.leave (9.0 KB) - added by earhg 4 years ago.

Download all attachments as: .zip

Change History (9)

Changed 4 years ago by earhg

comment:1 Changed 4 years ago by ros

Hi Hamish,

I know there has always been something odd with parexe on Polaris, which Grenville managed to fix with some script changes/hand-edits. Unfortunately Grenville is out of the office for another week. I will take a look and see if I can track it down, but it might have to wait until he returns.

Regards,
Ros.

comment:2 Changed 4 years ago by earhg

hi,
Please could someone look at this again? Sorry to be a pain…
many thanks
Hamish

comment:3 Changed 4 years ago by grenville

Hamish

We saw this problem at UM 6.6.3 - the fix at 6.6.3 should work at 8.4.

Please see /home/grenville/um6.6.3/hg6.6.3_polaris_fixes/src/script/control/make_parexe.pl — where I commented out the run through the environment variables, namely

# Now run though the environment and, in alphabetical order
# set the variables in print_array
#foreach $key ( sort keys %ENV ){
# $value = $ENV{$key};

# push @print_array, "$key='$value'\n";
#}

The same code appears in the UM 8.4 script (see https://puma.nerc.ac.uk/trac/UM/browser/UM/branches/dev/grenville/vn8.4_polaris/src/script/control/make_parexe.pl)

You can test the fix by commenting out the lines in your local copy of make_parexe.pl. (Best to make ea branch subsequently.)

The parexe file is useful for UM hackers but not so much for the rest of us.

Grenville

comment:4 Changed 4 years ago by earhg

hi Grenville,
Thank you very much for looking at this. I swapped in your make_parexe.pl file but the run failed in roughly the same place with the error here:

/nobackup/earhg/xmvnm/umatmos/bin/xmvnm.exe: error while loading shared libraries: libmpi_f90.so.1: cannot open shared object file: No such file or directory

I tried commenting out the set -a command, but it didn't help. The output from three runs today is at /home/ufaserv1_g/earhg/output/xmvnm000.xmvnm.d16236.t104945.leave

Is this the correct Fortran library? module list gives me Currently Loaded Modulefiles:

1) licenses 2) sge 3) bit/64 4) intel/12.1.5.339 5) mvapich2/1.8 6) leeds 7) user

thanks again for your help
Hamish

comment:5 Changed 4 years ago by grenville

Hamish

I'm looking at this - it appears to be a module mismatch. It's been years since we installed 8.4 on Polaris (it's not been used since as far as I know), so it's not surprising that there are some snags.

Grenville

comment:6 Changed 4 years ago by grenville

Hamish

I have changed the hand edit polaris_8.4.1 so that it loads the latest mvapich2, and rebuilt gcom for consistency. Doing this worked for my job xidem.

Please do a full rebuild (make sure the modification to parexe.pl you did earlier is there).

I have not rebuilt other gcoms for different mpi implementations, so please try to stick with mvapich2. We can rebuild the others if needed.

Grenville

comment:7 Changed 4 years ago by earhg

hi Grenville,
Thank you very much for this - I did as you suggested and the job completed nicely just now.
Hopefully it will work just as well when I try to run my customised jobs.
best
Hamish

comment:8 Changed 4 years ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.