Opened 2 years ago

Closed 22 months ago

Last modified 20 months ago

#1667 closed help (answered)

Compilation error: undefined reference to other routines/scripts

Reported by: dilshadshawki Owned by: annette
Priority: normal Component: UM Model
Keywords: compilation, UM, Cc:
Platform: MONSooN UM Version: 8.2

Description

Hello Helpdesk,

I ran the job xkhqg and found an error in the .leave below the message:

'/projects/ukca-imp/dshawk/xkhqg/umatmos/ppsrc/UM/control/top_level/atm_step.f90:11034: undefined reference to `atmos_physics2
_
'

/home/dshawk/output/xkhqg000.xkhqg.d15267.t143130.comp.leave
line 37406

It seems to not be able to locate/compile the other routines that it needs to complete the compilation. Has this sort of thing occurred before?

Your help would be very much appreciated!

Best wishes,
Dill

Change History (19)

comment:1 Changed 2 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Dill,

I've just updated the configuration, can you please do the following:

  • In UMUI window FCM Configuration → FCM Extract directories & output levels in the UM Container edit box replace @20056 with @vn8.2_cfg. (Annette had previously said to change this, but I have now updated the keyword to point to the correct revision so please use the keyword instead.)
  • In window Input/Output Control & resources → User hand-edit files add the hand-edit ~umui/hand_edits/nemoioserver.ed and put a Y in the last column.
  • In window Submodel Choices & Coupling → OASIS coupling switches & options remove the IBM compiler option -berok in the first line of the NEMO load flags table.
  • Make sure you also have Force full extract & Force full build switched on in the FCM Extract directories & output levels window.

Save, Process and Submit to compile.

With these changes I have successfully compiled your job.

Cheers,
Ros.

For reference this ticket is a continuation of #1663

comment:2 Changed 23 months ago by dilshadshawki

Hi Ros

After completing the instructions above my job did compile successfully. It then had some reconfiguration errors, but then saw another ticket which suggested it wasn't necessary. I then get an error in the .leave file as follows:

/home/dshawk/output/xkhqg000.xkhqg.d15271.t090527.leave

ERROR: Expected NEMO output files are not all available.

This may be a UM / OASIS / NEMO start-up problem.
The ocean.output file may provide more information.
'

I saw that another ticket #1186 mentioned this. What does this error mean and where can I find the ocean.output file?

Many thanks,

Dill

comment:3 Changed 23 months ago by annette

  • Owner changed from ros to annette
  • Status changed from accepted to assigned

Hi Dill,

That error message is a bit of a red-herring. There is no ocean.output file because the model hasn't started to run yet. The key line in the leave file is hidden in amongst some other messages:

apsched: claim exceeds reservation's resources

This means that the model is trying to run with more processors than were requested in the QSUB header.

The UMUI should work this all out for you so it looks like something has gone awry. I will look into this and get back to you.

Annette

comment:4 Changed 23 months ago by dilshadshawki

Hi Annette,

Thanks for looking into this. Any news so far?

Cheers,
Dill

comment:5 Changed 23 months ago by annette

Hi Dill,

I got your job to run by changing a couple of settings in the "User information → Job submission" panel. On the Cray system, the atmosphere, ocean and coupler executables need to run on separate nodes, therefore it is more efficient to up the number of processors for the ocean to a multiple of 32. The following settings work for me:

  • OpenMP : OFF
  • SMT : OFF
  • Atmos processors EW: 10
  • Atmos processors NS: 16
  • NEMO pes EW : 4
  • NEMP pes NS : 8
  • CICE columns per block EW : 90
  • CICE rows per block NS : 37

You will need to recompile after making these changes.

I also had to change one of the filenames under "Atmosphere → STASH → Initialisation of user prognostics". The file specified for item 190 did not exist so I changed it to:

/home/dshawk/startdumps/xkhqa/xkhqaa.da23000901_00

(I am not sure if that is correct though).

However the job fails with NaNs reported in some of the data fields. I tried reconfiguring the start dump, but the job still fails - this time in the interpolation routine of the SL advection:

/home/aospre/output/xlrde000.xlrde.d15275.t154309.leave

The model often crashes about here when there are NaNs in the data.

Would you expect the job to work? Did it run successfully on the old MONSooN?

Annette

comment:6 Changed 23 months ago by dilshadshawki

Hi Annette,

Many thanks for your help. Yes the job did work successfully on MONSooN as I have 6 other jobs under the xkhq experiment. I made the changes, compiled and ran the model again, I will let you know if I get a similar job failure/errors.

Best,
Dill

comment:7 Changed 23 months ago by dilshadshawki

Hi Annette,

The model failed at the compiling stage. Line 37256 (near the end) in:

 /home/dshawk/output/xkhqg000.xkhqg.d15281.t103238.comp.leave

ftn-855 crayftn: ERROR IOS_INIT, File = ../../../../../projects/ukca-imp/dshawk/xkhqg/umatmos/ppsrc/UM/io_services/server/ios
_init.f90, Line = 17, Column = 8
' The compiler has detected errors in module "IOS_INIT". No module information file will be created for this module.'

I looked into the fie and line that it suggested but I'm not sure what any of it means.

help!

Dill

comment:8 Changed 23 months ago by annette

Hi Dill,

This is another UMUI issue I'm afraid.

As a work-around go to "Independent Section Options → Miscellaneous Sections 94 - 98". Then under section 98 select "1A" and "Close" the window. Now "Save", "Process" and "Submit".

Do not re-open the Job Submission window before submission, as this causes the variable to be reset.

It is a rather complicated issue, that is actually an inconsistency with the code, but we are working on a fix for the UMUI and will patch it shortly.

Annette

comment:9 Changed 23 months ago by dilshadshawki

Hi Annette,

Any news on the fix?

Many thanks,

Dill

comment:10 Changed 23 months ago by annette

Dill,

Did you try the work-around?

Annette

comment:11 Changed 23 months ago by dilshadshawki

Apologies, yes I did try the work around last week but I get another error at the compiling stage:

/home/dshawk/output/xkhqg000.xkhqg.d15281.t142706.comp.leave

This time:

/home/users/ulib/hdf5/1.8.13/rpm/BUILD/cray-hdf5-1.8.13-cce1-serial/src/H5PL.c:535: warning: Using 'dlopen' in statically lin
ked applications requires at runtime the shared libraries from the glibc version used for linking
/projects/ukca-imp/dshawk/xkhqg/umatmos/lib/libfcmxkhqg.a(dd_all_call.o): In function `dd_all_call_':
/projects/ukca-imp/dshawk/xkhqg/umatmos/ppsrc/UM/atmosphere/convection/dd_all_call-ddacall4a.f90:443: undefined reference to
`dd_init_'
/projects/ukca-imp/dshawk/xkhqg/umatmos/lib/libfcmxkhqg.a(dd_call.o): In function `dd_call_':
/projects/ukca-imp/dshawk/xkhqg/umatmos/ppsrc/UM/atmosphere/convection/dd_call-ddcall4a.f90:409: undefined reference to `dd_i
nit_'
fcm_internal load failed (256)
# Time taken: 32 s⇒ ftn -o xkhqg.exe /projects/ukca-imp/dshawk/xkhqg/umatmos/obj/flumemain.o /projects/ukca-imp/ds
hawk/xkhqg/umatmos/obj/blkdata.o -L/projects/ukca-imp/dshawk/xkhqg/umatmos/lib -L/projects/ukca-imp/dshawk/xkhqg/baserepos/JU
LES/lib -L/projects/ukca-imp/dshawk/xkhqg/baserepos/UMATMOS/lib -lfcmxkhqg -LNetCDFmodule -L/projects/umadmin/ksival/oasi
s/oasis3_3/20110919_para_MxSeg1500/prism/crayxc40_cce/lib -lanaisg -lanaism -lpsmile.MPI1 -lfscint -lmpp_io -lscrip -L/proj
ects/umadmin/ksival/gcom/cce/gcom4.7/xc40_cce_mpp/build/lib -lgcom -LNetCDFmodule -lnetcdf -L/projects/um1/grib_api/cce-8.3
.4/1.13.0/lib -lgrib_api_f90 -lgrib_api -h omp
make: * [xkhqg.exe] Error 1
# Time taken: 640 s⇒ make -f /projects/ukca-imp/dshawk/xkhqg/umatmos/Makefile -j 4 all
make -f /projects/ukca-imp/dshawk/xkhqg/umatmos/Makefile -j 4 all failed (2) at /work/home/fcm/fcm-2015.05.0/bin/../lib/FCM1/
Build.pm line 611
cd /work/scratch/jtmp/pbs.265602.xcm00.x8z
Build failed on Thu Oct 8 14:40:55 2015.
→Make: 640 seconds
→TOTAL: 778 seconds
UMATMOS build failed

Is this another UMUI issue?

Cheers,
Dill

comment:12 Changed 23 months ago by annette

Dill,

I can't see any reason for the build failing again. I re-compiled my copy of your job (xlrde) and it completed OK.

Occasionally FCM can become confused, so can you try deleting the job directory and re-submitting:

rm -rf /projects/ukca-imp/dshawk/xkhqg

Annette

comment:13 Changed 22 months ago by dilshadshawki

Hi Annette,

Thank you, deleting the job directory did work, but then there is an ERROR in the .leave file when it tries to run:

/home/dshawk/output/xkhqg000.xkhqg.d15287.t150522.leave
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: glue_conv
? Error Code:     3
? Error Message: Mid conv went to the top of the model at point           39 in seg on call  1
? Error generated from processor:     0
? This run generated   2 warnings
????????????????????????????????????????????????????????????????????????????????

There was another ticket that suggested this could be solved by using a different start dump, but while MOOSE/MASS is currently unavailable (this is where I have all the startdumps stored), could this be error be due to something else? I ask because the other ticket #1639 seems to have had a different problem and I am not sure what is the cause of my problem.

Many thanks,
Dill

comment:14 Changed 22 months ago by annette

Dill,

This error usually means that the model has become unstable which can happen for a number of reasons (but that's why using a different start dump can help.) If you look in the .leave file you will see that many of the fields contain NaNs, and the solver at timestep 73 has not converged.

I think I had the same error in my copy of your job, and so I tried reconfiguring the start dump, but then got an error in the advection instead.

This can be tricky to debug, but there are some things that you can do…

First try compiling with optimisations off.

  • In "Compile and run options for Atmosphere", make sure "Compile model executable" is selected, and set the level of optimisation to be "debug".
  • In "Compile and run options for NEMO", make sure "Compile and build the executable, then run" is selected.
  • In "FCM options for NEMO" change the NEMO compiler flags file to be:
    ~annette/hadgem3/cfg/nemo7.9_xc30_cce_O0.cfg
  • In "FCM options for CICE" change the CICE machine compiler flags file to:
    ~annette/hadgem3/cfg/cice_xc30_cce_O0.cfg

Perhaps also change the run length to just 10 days or so, so you can see if it runs past the original crash.

Annette

comment:15 Changed 22 months ago by dilshadshawki

Hi Annette,

Thanks for the suggestions. I wanted to try some new start dumps as well as implement the changes mentioned above, but I can't find modify_CICE_header in the usual place: /home/aospre/bin/modify_CICE_header

Has this been moved to somewhere else since the move to CRAY?

Many thanks,
Dill

comment:16 Changed 22 months ago by annette

Hi Dilshad,

I did have a look at this last week but I couldn't find the program installed anywhere, and my version wouldn't compile. I will investigate further and get back to you.

Annette

comment:17 Changed 22 months ago by annette

Dilshad,

You no longer need to reset the CICE header when restarting a run. Instead re-build the code with the following branches:

  • UM: fcm:um-br/dev/jwalton/vn8.2_NEMOCICE_restart_fixes_UKMO
  • CICE: fcm:cice-br/dev/jwalton/vn4.1m1_restart_date_fix_UKMO

Regards,
Annette

Last edited 22 months ago by annette (previous) (diff)

comment:18 Changed 22 months ago by annette

  • Resolution set to answered
  • Status changed from assigned to closed

Continued in #1719

comment:19 Changed 20 months ago by annette

In reference to comment:7 above, the compile error with the IO server code should now be fixed and you should no longer need to use the hack described in comment:8.

We have updated the central configuration files for vn8.2, so the fix should be picked up automatically next time you submit a compilation job.

Do let us know if you have any issues with this though.

Best regards,
Annette

Note: See TracTickets for help on using tickets.