Opened 5 years ago
Closed 5 years ago
#1938 closed help (fixed)
POLARIS UKCA run
Reported by: | earhg | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | Other | |
UM Version: | 8.4 |
Description
hi,
I am getting some strange looking errors when trying to run a copy of Grenville's job xidem on POLARIS, my job xmvnm, from PUMA for the first time. The .leave file is attached. They look a bit like they might be related to the POLARIS hand edit, but since I am sourcing Grenville's file I don't see how it could cause an issue. Many thanks for any ideas! Hamish
Attachments (1)
Change History (9)
Changed 5 years ago by earhg
comment:1 Changed 5 years ago by ros
comment:2 Changed 5 years ago by earhg
hi,
Please could someone look at this again? Sorry to be a pain…
many thanks
Hamish
comment:3 Changed 5 years ago by grenville
Hamish
We saw this problem at UM 6.6.3 - the fix at 6.6.3 should work at 8.4.
Please see /home/grenville/um6.6.3/hg6.6.3_polaris_fixes/src/script/control/make_parexe.pl — where I commented out the run through the environment variables, namely
# Now run though the environment and, in alphabetical order
# set the variables in print_array
#foreach $key ( sort keys %ENV ){
# $value = $ENV{$key};
# push @print_array, "$key='$value'\n";
#}
The same code appears in the UM 8.4 script (see https://puma.nerc.ac.uk/trac/UM/browser/UM/branches/dev/grenville/vn8.4_polaris/src/script/control/make_parexe.pl)
You can test the fix by commenting out the lines in your local copy of make_parexe.pl. (Best to make ea branch subsequently.)
The parexe file is useful for UM hackers but not so much for the rest of us.
Grenville
comment:4 Changed 5 years ago by earhg
hi Grenville,
Thank you very much for looking at this. I swapped in your make_parexe.pl file but the run failed in roughly the same place with the error here:
/nobackup/earhg/xmvnm/umatmos/bin/xmvnm.exe: error while loading shared libraries: libmpi_f90.so.1: cannot open shared object file: No such file or directory
I tried commenting out the set -a command, but it didn't help. The output from three runs today is at /home/ufaserv1_g/earhg/output/xmvnm000.xmvnm.d16236.t104945.leave
Is this the correct Fortran library? module list gives me Currently Loaded Modulefiles:
1) licenses 2) sge 3) bit/64 4) intel/12.1.5.339 5) mvapich2/1.8 6) leeds 7) user
thanks again for your help
Hamish
comment:5 Changed 5 years ago by grenville
Hamish
I'm looking at this - it appears to be a module mismatch. It's been years since we installed 8.4 on Polaris (it's not been used since as far as I know), so it's not surprising that there are some snags.
Grenville
comment:6 Changed 5 years ago by grenville
Hamish
I have changed the hand edit polaris_8.4.1 so that it loads the latest mvapich2, and rebuilt gcom for consistency. Doing this worked for my job xidem.
Please do a full rebuild (make sure the modification to parexe.pl you did earlier is there).
I have not rebuilt other gcoms for different mpi implementations, so please try to stick with mvapich2. We can rebuild the others if needed.
Grenville
comment:7 Changed 5 years ago by earhg
hi Grenville,
Thank you very much for this - I did as you suggested and the job completed nicely just now.
Hopefully it will work just as well when I try to run my customised jobs.
best
Hamish
comment:8 Changed 5 years ago by grenville
- Resolution set to fixed
- Status changed from new to closed
Hi Hamish,
I know there has always been something odd with parexe on Polaris, which Grenville managed to fix with some script changes/hand-edits. Unfortunately Grenville is out of the office for another week. I will take a look and see if I can track it down, but it might have to wait until he returns.
Regards,
Ros.