#2478 closed help (fixed)

build failure

Reported by: ggxmy Owned by: um_support
Component: UM Model Keywords:
Cc: m.g.richardson@… Platform: ARCHER
UM Version: 8.2

Description

Now I have a different problem with compilation of UM vn8.2 regional model. The job is tewne created based on xkztj . The .leave file contains these messages;

# Start: 2018-06-01 11:52:41=> ftn -o netcdf_mod.o -I/home/n02/n02/masara/um/tewne/umatmos/inc -I/home/n02/n02/masara/um/tewne/baserepos/JULES/inc -I/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/inc -V -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/include  -I /work/y07/y07/umshared/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/control/misc/netcdf_mod.f90
fcm_internal compile failed (256)
# Time taken:            0 s=> ftn -o ios_decompose.o -I/home/n02/n02/masara/um/tewne/umatmos/inc -I/home/n02/n02/masara/um/tewne/baserepos/JULES/inc -I/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/inc -V -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/include  -I /work/y07/y07/umshared/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/io_services/common/ios_decompose.f90
gmake: *** [ios_decompose.o] Error 1
gmake: *** Waiting for unfinished jobs....
ftn-714 crayftn: WARNING in command line
  The "noomp" command has been overridden by the "omp" command.
Cray Fortran : Version 8.5.8 (20170217211354_0071b093c0867e302a5c08403fe34f8aa78fc212)
Cray Fortran : Fri Jun 01, 2018  11:52:42
Cray Fortran : Compile time:  0.1120 seconds
Cray Fortran : 80 source lines
Cray Fortran : 0 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.
GNU assembler version 2.26.0 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.26.0.20160224

I checked the files and directories used here and found that

/home/n02/n02/masara/um/tewne/umatmos/inc and
/work/y07/y07/umshared/gcom/cce/gcom4.5/archer_cce_mpp/inc

exist and contain lots of .mod and .h files.

/home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/control/misc/netcdf_mod.f90 and
/home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/io_services/common/ios_decompose.f90

are also present. However,

/home/n02/n02/masara/um/tewne/baserepos/JULES/inc ,
/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/inc , and
/work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/include

are not present. Are these causing a problem? Should I (somehow) specify my own directory instead of the last one? The full output is

/home/n02/n02/masara/output/tewne000.tewne.d18152.t113754.comp.leave

Thanks,
Masaru

Change History (9)

comment:1 Changed 15 months ago by ggxmy

  • Platform set to ARCHER

comment:2 Changed 15 months ago by willie

Hi Masaru,

I've put the GCOM code back in /work/n02/n02/wmcginty/gcom4.5, so try again.

Regards
Willie

comment:3 Changed 15 months ago by ggxmy

compilation failed in a very similar way as before. Below is the excerpt from /home/n02/n02/masara/output/tewne-152113742.20180601-143051.comp.leave .

/work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/ does not seem to contain build/include/ folder. That means the situation has not been changed from the previous attempt although I'm not sure if this is the cause of the problem.

# Time taken:            0 s=> ftn -o grdtypes_mod.o -I/home/n02/n02/masara/um/tewne/umatmos/inc -I/home/n02/n02/masara/um/tewne/baserepos/JULES/inc -I/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/inc -V -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/include  -I /work/y07/y07/umshared/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/control/top_level/grdtypes_mod.f90
mv grdtypes_mod.o /home/n02/n02/masara/um/tewne/umatmos/obj
fcm_internal compile:F UM__control__mpp /home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/control/mpp/sterr_mod.f90 sterr_mod.o
ftn-3178 crayftn: LIMIT in command line
  There is a ***problem*** with a temp file or a program library file being used in this compilation.
cd /home/n02/n02/masara/um/tewne/umatmos/tmp
# Start: 2018-06-01 14:34:19=> ftn -o sterr_mod.o -I/home/n02/n02/masara/um/tewne/umatmos/inc -I/home/n02/n02/masara/um/tewne/baserepos/JULES/inc -I/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/inc -V -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/include  -I /work/y07/y07/umshared/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/control/mpp/sterr_mod.f90
Cray Fortran : Version 8.5.8 (20170217211354_0071b093c0867e302a5c08403fe34f8aa78fc212)
Cray Fortran : Fri Jun 01, 2018  14:34:19
Cray Fortran : Compile time:  0.0160 seconds
Cray Fortran : 113 source lines
Cray Fortran : 0 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.
fcm_internal compile ***failed*** (256)
# Time taken:            0 s=> ftn -o mask_compression.o -I/home/n02/n02/masara/um/tewne/umatmos/inc -I/home/n02/n02/masara/um/tewne/baserepos/JULES/inc -I/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/inc -V -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/include  -I /work/y07/y07/umshared/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/masara/um/tewne/umatmos/ppsrc/UM/control/packing_tools/mask_compression.f90
gmake: *** [mask_compression.o] Error 1
gmake: *** Waiting for unfinished jobs....
ftn-714 crayftn: WARNING in command line
  The "noomp" command has been overridden by the "omp" command.

comment:4 Changed 15 months ago by willie

Hi Masaru,

It's been over three years since xkztj was last compiled and during that time the ARCHER software has changed significantly. Fortunately most of the old modules are still there. They just need to be loaded. I have created a hand edit

~willie/hand_edits/set_cce_8.3.3.ed

that makes the required changes. You just need to add this to tewne and recompile.

Regards,
Willie

comment:5 Changed 15 months ago by ggxmy

Thank you Willie.

Including that hand edit I seem to be a step forward but still having a problem in compilation. I got these messages in /home/n02/n02/masara/output/tewne000.tewne.d18156.t135428.comp.leave ;

cray-mpich/7.5.5(34):ERROR:150: Module 'cray-mpich/7.5.5' conflicts with the currently loaded module(s) 'cray-mpich/7.1.1'
cray-mpich/7.5.5(34):ERROR:102: Tcl command execution failed: conflict cray-mpich

/opt/cray/cce/8.3.3/cray-binutils/x86_64-unknown-linux-gnu/bin/ld: cannot find -lgcom
/opt/cray/cce/8.3.3/cray-binutils/x86_64-unknown-linux-gnu/bin/ld: cannot find -lgcom
/opt/cray/hdf5/1.10.0.1/CRAY/8.3/lib/libhdf5.a(H5PL.o): In function `H5PL__open$$CFE_id_56395c9c_a2f1556b':
/b/ulib/hdf5-support/rpm/BUILD/cray-hdf5-1.10.0.1-201612052137.d5c01d2b84e7c-cce1-serial/hdf5-1.10.0-patch1/src/H5PL.c:614: warning: Using '
dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
fcm_internal load failed (256)
# Time taken:            4 s=> ftn -o tewne.exe /home/n02/n02/masara/um/tewne/umatmos/obj/flumemain.o /home/n02/n02/masara/um/tewne/umatmos/
obj/blkdata.o -L/home/n02/n02/masara/um/tewne/umatmos/lib -L/home/n02/n02/masara/um/tewne/baserepos/JULES/lib -L/home/n02/n02/masara/um/tewn
e/baserepos/UMATMOS/lib -l__fcm__tewne -L. -L /work/n02/n02/wmcginty/gcom4.5/lib -Wl,--warn-unresolved-symbols -Wl,-z,muldefs -s real64 -s i
nteger64 -lgcom -L /work/y07/y07/umshared/lib/cce -L /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/lib -lgcom -h omp -hsystem_alloc   
-L /work/y07/y07/umshared/lib/cce -lgrib  -h omp
gmake: *** [tewne.exe] Error 1
# Time taken:          909 s=> gmake -f /home/n02/n02/masara/um/tewne/umatmos/Makefile -j 4 all
gmake -f /home/n02/n02/masara/um/tewne/umatmos/Makefile -j 4 all failed (2) at /fs2/y07/y07/umshared/software/fcm-2016.12.0/bin/../lib/FCM1/
Build.pm line 611
cd /home2/n02/n02/masara
Build failed on Tue Jun  5 14:23:33 2018.
->Make: 909 seconds
->TOTAL: 1190 seconds
UMATMOS build failed

Here again, some are present but others are not, as summarised below;

/home/n02/n02/masara/um/tewne/umatmos/lib is an empty folder.

/home/n02/n02/masara/um/tewne/baserepos/JULES/lib does not exist,
/home/n02/n02/masara/um/tewne/baserepos/UMATMOS/lib does not exist either,
/work/n02/n02/wmcginty/gcom4.5/lib neither.

so far these are similar to (though not the same as) the original problem at the top. but this one drew my attention;
/work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build/lib: Permission denied

In /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/ ,

drwx——— 5 wmcginty n02 4096 Jun 4 14:41 build

so this parent directory is not accessible…I think it was accessible a few days ago… doesn't it need to be accessed for the job to be compiled? but even then I can't remember if it contained lib. Or can you see any other possible cause of the problem? Do you have an idea how it can be solved?

Thank you.
Masaru

comment:6 Changed 15 months ago by willie

Hi Masaru,

Sorry about that. I've rebuilt with more open permissions, so you should try again. Longer term, you could copy this entire directory into your own work space on the Lustre disks. To rebuild, if necessary (it won't be) just execute the build_gcom4.5.pbs script on the serial queue.

Regards,
Willie

comment:7 Changed 15 months ago by ggxmy

Thanks Willie. The job finally compiled, phew-

I'm not sure what Lustre disks are, but did you mean I should copy /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp/build to my /work/n02/n02/masara/ ? Or did you actually mean /work/n02/n02/wmcginty/gcom4.5/archer_cce_mpp ? I did the latter.

How do I make the compiler to search for my copy rather than yours?

Masaru

comment:8 Changed 15 months ago by willie

Hi Masaru,

That's great. The Lustre disk are /work and are a different type of disc from /home. Files that are needed by the model running on the compute nodes need to be on /work. I was proposing that you

cp -r /work/n02/n02/wmcginty/gcom4.5 /work/n02/n02/masara/

That way, you model will still run if I accidentally delete gcom4.5

Regards
Willie

comment:9 Changed 14 months ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.