Opened 4 years ago

Closed 4 years ago

#1639 closed help (fixed)

global model build job not compiling

Reported by: anmcr Owned by: um_support
Component: UM Model Keywords: build job, grid point storm, convection
Cc: Platform: ARCHER
UM Version: 8.2

Description

Hello,

This is related to previous ticket #1612 which was a request for a global model job. Willie suggested that I copy UMUI experiment xjpq (user 'umui'), which has build, run and reconfig jobs. He said to switch off the hand edit for the Intel compiler. My copy of the build job is xltua.

The problem I have is that the compilation of xltua fails. I am unsure why and can't find any clues from the output file or from previous ncas tickets. A section from the relevant output file is given below (from xltua000.xltua.d15245.t114228.comp.leave).

Thanks for any help.

Andrew

"-t" is an invalid command-line option.

fcm_internal compile failed (256)
gmake: * [filenamelength_mod.o] Error 1
gmake:
* Waiting for unfinished jobs….
fcm_internal compile failed (256)
fcm_internal compile failed (256)
gmake: * [ios_mpi_error_handlers.o] Error 1
gmake:
* [ios_decompose.o] Error 1
ftn -o grdtypes_mod.o -I/home/n02/n02/anmcr/xltua/umatmos/inc -I/home/n02/n02/anmcr/xltua/baserepos/JULES/inc -I/home/n02/n02/anmcr/xltua/baserepos/UMATMOS/inc -std95 -i8 -r8 -traceback -fp-model source -I /work/n02/n02/hum/gcom/intel/gcom4.5/archer_intel_mpp/inc -g -I /work/n02/n02/hum/gcom/cce/gcom4.5/archer_cce_mpp/inc -c /home/n02/n02/anmcr/xltua/umatmos/ppsrc/UM/control/top_level/grdtypes_mod.f90
ftn-2105 crayftn: ERROR in command line

"-i" is an invalid command-line option.

ftn-2191 crayftn: ERROR in command line

"8" is an invalid argument to the "-r" option.

ftn-2105 crayftn: ERROR in command line

"-t" is an invalid command-line option.

fcm_internal compile failed (256)
gmake: * [grdtypes_mod.o] Error 1
gmake -f /home/n02/n02/anmcr/xltua/umatmos/Makefile -j 6 -s all failed (2) at /fs2/n02/n02/hum/software/fcm-2015.05.0/bin/../lib/FCM1/Build.pm line 611
Build failed on Wed Sep 2 11:45:00 2015.
→Make: 0 second
→TOTAL: 12 seconds
UMATMOS build failed

Change History (15)

comment:1 Changed 4 years ago by annette

Andrew,

You need to switch off the Intel compile override as well. Go to "Compilation and Run Options → UM User Override Files".

Annette

comment:2 Changed 4 years ago by anmcr

Dear Annette,

Thanks for the speedy reply. I got the global model to build, reconfigure and run. However, the run fails after running for 50 minutes. I'm not sure why.

I've looked at both the output file and the files in pe_output. I thought that it might be failing because it was due to output after 1 hr, so I switched off the LBC generation and STASH - however it still fails. I would appreciate it if someone could please take a look.

The job is xltub, and the latest output file xltub000.xltub.d15246.t140655.leave.

Thanks,

Andrew

comment:3 Changed 4 years ago by willie

Hi Andrew,

It works for me - see my job xlztb, although I didn't run with your start dump. I also have different STASH output from you. You may need to select "standard STASH output" in the STASH > STASH netcdf options panel.

Regards,

Willie

comment:4 Changed 4 years ago by willie

Hi Andrew,

But it fails with your start dump. How did you create this?

Willie

comment:5 Changed 4 years ago by anmcr

Hi Willie,

I created the start dump by using the 'build' job (xltua) to create the reconfiguration executable.

I checked the output file for the reconfiguration (xltuc000.xltuc.d15246.t111009.rcf.leave) and it seemed fine. I also checked some of the fields using xconv. However, looking again just now I noticed that the start dump contains fields for CLIM biomass burning, CLIM dust size etc.

Andrew

comment:6 Changed 4 years ago by willie

Hi Andrew,

Both your start dump and mine were created in the same way from UM7.9 dumps. When reconfigured they have the same fields. As far as I can tell they are good dumps. The problem is that an error occurs at time step 5:

? Error in routine: glue_conv
? Error Code:     3
? Error Message: Mid conv went to the top of the model at point           63 in seg on call  1
? Error generated from processor:    15

Dumping each time step shows that NaNs? appear in the processed dumps in various fields e.g. STASH 431 Dust division 1 MMR. I have tried various MPI related fixes and also increasing from 6x16 to 12x16 processors to no avail. Switching off aerosol modelling has no impact on the problem either.

If it is possible for you to choose a different start dump, that might be the way forward.

Regards,

Willie

comment:7 Changed 4 years ago by anmcr

Hi Willie,

Thanks for investigating this so comprehensively.

I'II try today with a different start dump and let you know how I get on.

Best wishes,

Andrew

comment:8 Changed 4 years ago by willie

  • Keywords job, grid point storm, convection added; job removed

Hi Andrew,

Simon Wilson has pointed out that this is a grid point storm. It occurs at 167.7W, 63.3S which is just off the coast of Antarctica. The vertical wind velocity exceeds 30m/s and the temperature at the surface (THETA after TS) exceeds 5,000K. I tried a run with 4 convection calls per physics time step instead of the usual two, but with no success.

Regards,

Willie

comment:9 Changed 4 years ago by anmcr

Dear Willie,

Thanks for the additional information.

I got the global model to run with another startdump - the actual dates of the run are not important as I am only interested in calculating the cpu time of various limited area runs on archer (12 km and 4 km).

However, I got a limited area model from Grenville (xltud) which I can't get to build.

Are you able to help further with this?

Thanks,

Andrew

comment:10 Changed 4 years ago by willie

Hi Andrew,

Your job xltud is derived from my base job xlemc which was a run only job designed for the SWAMMA project. It uses an executable built in my job xkztj, so take a copy of that and build it and then point xltud at the executables.

Regards

Willie

comment:11 Changed 4 years ago by anmcr

Dear Willie,

I've done as you suggested. xltue is my build job (copy of xkztj). xltud is the run job (copy of xlemc).

xlemc fails on the reconfiguration with the error 'section 0 item 274: required field is not in input dump'.

The missing field is the mean topographic index, and the problem is similar to ticket #841.

I think its therefore related to the build job having JULES switched on. I tried switching off the JULES component in the reconfiguration run but it failed with the error 'error reading namelist jules_nametypes'.

Could you please advise.

Many thanks,

Andrew

comment:12 Changed 4 years ago by anmcr

Hi again Willie,

As I am purely interested in getting an idea of cpu time/cost for regional model runs on ARCHER, would I be better to keep JULES on and use the same startdump / configuration as you did.

Andrew

comment:13 Changed 4 years ago by anmcr

Dear Willie,

I seem to making some progress on my own.

I've got the reconfiguration to run (xltud). I'm compiling the run executable now (xltuf).

Andrew

comment:14 Changed 4 years ago by willie

Ok Andrew, I'll close this ticket now.

Willie

comment:15 Changed 4 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.