Opened 5 years ago

Closed 5 years ago

#1298 closed help (fixed)

Compile problems on Archer using the GCOM library

Reported by: gn907779 Owned by: um_support
Component: Large Eddy Model Keywords: ARCHER LEM GCOM
Cc: Platform: ARCHER
UM Version: <select version>

Description

The LEM used GCOM3.0 on Hector where the main include file was named gc_com.h. On Archer I have found GCOM3.8 where this file seems to have been renamed gcom.h. and so the compile fails. I am trying to run one of the test_cases to familiarise myself with the LEM and don't want to make any code changes at this stage.

Is there a previous version of GCOM on Archer where this file is named gc_com.h? Alternatively could someone with the necessary privilages create a link to gcom.h which is named gc_com?

Change History (14)

comment:1 Changed 5 years ago by willie

Hi Will,

The last I recall is that the LEM required GCOM3.8. You should have a
fix
fix_gcom3.8_v_2.4.f
that allows it to compile the LEM code. Do you have this?

Regards

Willie

comment:2 Changed 5 years ago by gn907779

Yes, that helps. Thanks. les.f now compiles.
However it fails later on with something that doesn't appear to be source code related.

ftn-2116 crayftn: INTERNAL

"/opt/cray/cce/8.2.1/cftn/x86-64/lib/optcg" was terminated due to receipt of signal 011: Killed.

the last successful compile was resdgs.f. I tried compiling this again, manually and it compiled ok. I also tried a manual compile on the next dozen or so files in the list and they also compiled OK.

So, do you have any idea as to why the job could have been killed?

comment:3 Changed 5 years ago by willie

  • Component changed from ARCHER to Other

Hi Will

If you are running the scripts on the command line, then ARCHER will kill the script if it takes too much resource (> 20 mins, I think). You should submit the compile job to the serial queue.

Regards

Willie

comment:4 Changed 5 years ago by willie

  • Component changed from Other to Large Eddy Model

comment:5 Changed 5 years ago by gn907779

OK I've changed the run script to submit the compile job to the serial queue and it builds the object files now but is failing at the link stage:
/opt/cray/cce/8.2.1/cray-binutils/x86_64-unknown-linux-gnu/bin/ld: cannot find -lgcom_buffered_mpi

Can you confirm that I am using the correct path in my makefile:

GCOM1=-L/home/n02/n02/hum/gcom3.8/archer_cce_mpp/lib \
-I/home/n02/n02/hum/gcom3.8/archer_cce_mpp/inc
GCOM2=-lgcom_buffered_mpi

comment:6 Changed 5 years ago by gn907779

Please could you also confirm that I have the correct netcdf library path. Previously it was failing because it couldn't find the netcdf include file, so I used the following to specify the path to netcdf:

NETCDF=-I/opt/cray/netcdf/4.3.0/CRAY/81/include \
-L/opt/cray/netcdf/4.3.0/CRAY/81/lib -lnetcdf

Now the linker is failing with many undefined references to netcdf variables e.g. nf_def_var_ and nf_create_. I also tried version 4.3.1 of the netcdf libraries with the same result.
Thanks.

comment:7 Changed 5 years ago by gn907779

The solution to the 2 previous issues is as follows

  1. the linker option for the later version of GCOM is -lgcom
  2. When submitting the compile job to the serial queue you must load the netcdf module in the script thus:- module load cray-netcdf

The goles executable now builds with the following warnings:

!DIR$ NOSPLIT MOISTRI2.74


ftn-790 crayftn: WARNING MOISTRI2, File = moistri2.f, Line = 561, Column = 7

Unknown or unsupported compiler directive or syntax error.

Cray Fortran : Version 8.2.x.x (u82093f82212i82224p82394a82022e82011z82394)
Cray Fortran : (x8232r82023w82008t8213b82042k82020)
Cray Fortran : Thu May 29, 2014 09:47:52
Cray Fortran : Compile time: 0.1720 seconds
Cray Fortran : 611 source lines
Cray Fortran : 0 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.

WRITE(6,*) MNCFNAME(INC), NCID(INC) MOVIE_NC.454

ftn-7212 crayftn: WARNING MOVIE_NC, File = movie_nc.f, Line = 1047

Variable "ncid()" is used before it is defined.

Cray Fortran : Version 8.2.x.x (u82093f82212i82224p82394a82022e82011z82394)
Cray Fortran : (x8232r82023w82008t8213b82042k82020)
Cray Fortran : Thu May 29, 2014 09:47:53
Cray Fortran : Compile time: 2.1401 seconds
Cray Fortran : 3240 source lines
Cray Fortran : 0 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.

PGmlt=MAX(0.,Ventfac_G*RMELTtemp+(Cwater/RLFUS) RIME.300

ftn-7212 crayftn: WARNING RIME, File = rime.f, Line = 867

Variable "pgshd" is used before it is defined.

Cray Fortran : Version 8.2.x.x (u82093f82212i82224p82394a82022e82011z82394)
Cray Fortran : (x8232r82023w82008t8213b82042k82020)
Cray Fortran : Thu May 29, 2014 09:48:34
Cray Fortran : Compile time: 0.1400 seconds
Cray Fortran : 901 source lines
Cray Fortran : 0 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.

!DIR$ NOSPLIT SETFRI.30


ftn-790 crayftn: WARNING SETFRI, File = setfri.f, Line = 303, Column = 7

Unknown or unsupported compiler directive or syntax error.

Cray Fortran : Version 8.2.x.x (u82093f82212i82224p82394a82022e82011z82394)
Cray Fortran : (x8232r82023w82008t8213b82042k82020)
Cray Fortran : Thu May 29, 2014 09:48:46
Cray Fortran : Compile time: 0.0880 seconds
Cray Fortran : 323 source lines
Cray Fortran : 0 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.

!DIR$ NOSPLIT ULTFLX.113


ftn-790 crayftn: WARNING ULTFLX, File = ultflx.f, Line = 426, Column = 7

Unknown or unsupported compiler directive or syntax error.

!DIR$ NOSPLIT ULTFLX.183


ftn-790 crayftn: WARNING ULTFLX, File = ultflx.f, Line = 496, Column = 7

Unknown or unsupported compiler directive or syntax error.

!DIR$ NOSPLIT ULTFLX.254


ftn-790 crayftn: WARNING ULTFLX, File = ultflx.f, Line = 567, Column = 7

Unknown or unsupported compiler directive or syntax error.

!DIR$ NOSPLIT ULTFLX.311


ftn-790 crayftn: WARNING ULTFLX, File = ultflx.f, Line = 624, Column = 7

Unknown or unsupported compiler directive or syntax error.

!DIR$ NOSPLIT ULTFLX.374


ftn-790 crayftn: WARNING ULTFLX, File = ultflx.f, Line = 687, Column = 7

Unknown or unsupported compiler directive or syntax error.

!DIR$ NOSPLIT ULTFLX.441


ftn-790 crayftn: WARNING ULTFLX, File = ultflx.f, Line = 754, Column = 7

Unknown or unsupported compiler directive or syntax error.

Cray Fortran : Version 8.2.x.x (u82093f82212i82224p82394a82022e82011z82394)
Cray Fortran : (x8232r82023w82008t8213b82042k82020)
Cray Fortran : Thu May 29, 2014 09:48:59
Cray Fortran : Compile time: 0.6960 seconds
Cray Fortran : 836 source lines
Cray Fortran : 0 errors, 6 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.
ftn-1391 crayftn: WARNING in command line

Source file "zzz000.f" contains no Fortran statements.

/opt/cray/hdf5/1.8.11/CRAY/81/lib/libhdf5.a(H5PL.o): In function `H5PLopen$$CFE_id_56395c9c_01603595':
/home/users/seanb/pelibs/hdf5/1.8.11/rpm/BUILD/cray-hdf5-1.8.11-cce1-serial/src/H5PL.c:531: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

However, goles does not run on the compute nodes. It crashes as follows:

_pmiu_daemon(SIGCHLD): [NID 02465] [c4-1c2s8n1] [Thu May 29 11:21:30 2014] PE RANK 3 exit signal Segmentation fault
[NID 02465] 2014-05-29 11:21:31 Apid 8474066: initiated application termination
Application 8474066 exit codes: 139
Application 8474066 resources: utime ~0s, stime ~0s, Rss ~4100, inblocks ~19855, outblocks ~62268

comment:8 Changed 5 years ago by willie

Hi Will,

I now have test cases 1 - 4 running on ARCHER. Don't worry about the warnings, I get them too. Which test cases are you trying to run? If you could

 chmod -R g+rX /home/n02/n02/gn907779
 chmod -R g+rX /work/n02/n02/gn907779

then I'll be able to see your files.

Regards,

Willie

comment:9 Changed 5 years ago by gn907779

Hi Willie,

Thanks for your message. I've run chmod so you should be able to see my files. I am trying to run test case 2. The run that gives the segmentation fault is here:
/work/n02/n02/gn907779/lem/runs/r22

Thanks,
Will

comment:10 Changed 5 years ago by willie

Hi Will,

I have used all the fixes in my runs, not just the gcom fix. I also needed to fix the spectral namelists, as they are not suitable for ARCHER. I used the Cray compiler on ARCHER.

I think it would be preferable if we both used the same code. You can get a copy of mine from PUMA in /home/willie/temp/LEM_archer_r14.tgz. This comes with a full provenance from the Met Office ( I have a repository on PUMA). It is designed to be launched from PUMA and run of ARCHER, but it would be simple to change to your details. The output from this for test case 2 can be found on ARCHER in /work/n02/n02/wmcginty/LEM_runs/test_case2.

Let me know if this poses any difficulties.

Regards

Willie

comment:11 Changed 5 years ago by gn907779

Thanks for that willie, I copied your files on Fri 30/5/14 and submitted the job. Then Archer was down for maintenance and when it came back on Fri 6/6/14 the job was still in the queue.

Then I was out of the office for 2 weeks.

The reason it was still in the queue was that I didn't change the budget code and because I am not part of your budget group the job would never run.

I've fixed that now and the job ran but the compile failed. std err says module not found and std out says cannot open include file netcdf.inc.

I guess this is because of the maintenance upgrade on ARCHER. Can you suggest what needs to be changed here. The scripts are in /home/gn907779/LEM_archer/LEMSUB/RUNFILES/ on puma, and the run directory is in /work/n02/n02/gn907779/lem/runs/r24 on archer.

Thanks

comment:12 Changed 5 years ago by willie

Hi Will,

I notice that you don't have .profile and .kshrc on ARCHER. What shell are you running? You can see this by

echo $SHELL

When the LEM is launched remotely, ARCHER needs to be able to find the 'module' command and this is where the .profile comes in. This is done in the runREMOTE script, but it interacts with your .profile set up - I've never run it without.

So, If you are using bash then I recommend chsh to Korn. You can copy my profile/kshrc: they are at /home/n02/n02/wmcginty.

I just checked: test2 is still working on ARCHER :)

Regards,

Willie

comment:13 Changed 5 years ago by gn907779

Yes, that fixed it. I changed my shell from bash to Korn and test 2 ran on ARCHER!

Thanks, Willie!

comment:14 Changed 5 years ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.