Opened 2 years ago

Closed 2 years ago

#2641 closed help (fixed)

ITD cleanup error in step_therm2

Reported by: ChrisWells Owned by: um_support
Component: UM Model Keywords: coupled model
Cc: Platform: Monsoon2
UM Version: 8.2



(Following from - I'm running the UM at vn8.2 on xcs, so have been having problems getting it to fit together.

The model now runs for a short time but fails in routine step_therm2, with the error "ITD cleanup error in step_therm2" - I'm unsure what this is. Do you know how I can fix this?

Many thanks,

Change History (15)

comment:1 Changed 2 years ago by grenville


The model is complaining about inconsistent ice and snow energies (see "zerolayer check - wrong ice energy" messages in the leave file.)

Are you sure that the start data is consistent and appropriate - you say the job has run on xcml00; do you have the log files from that run?

Are you in contact with the owner of the job you copied?


comment:2 Changed 2 years ago by ChrisWells

Hi Grenville,

The previous owner is in my group so I am in contact; the start dumps are all in /projects/ukca-imp/mkasoa/ukca-projects/start_dumps/

Atmos: xjndaa.da20501201_00_fix


CICE: anqdhi.restart.1999-12-01-00000_reset

These have worked before with start date 01 Dec 1999 (which I am trying to use too). The NEMO and CICE dumps should be consistent because they came from the same simulation.

We don’t still have any of the log files from the original coupled runs unfortunately, but we do have the log files from some coupled runs which used mostly the same set-up as the copied runs (including the same NEMO and CICE setup), although the actual start dumps used might be have been different.

e.g. for job xlzci, the .leave files are in ~dshawk/output/, and the processor output files are in /projects/ukca-imp/dshawk/xlzci/pe_output/).

Hope this is useful - is there anything else I can provide which might help?

Many thanks,

comment:3 Changed 2 years ago by grenville


Could you try running an exact copy of xlzci? If that succeeds, we'll have a better idea of where to start debugging your run.


comment:4 Changed 2 years ago by ChrisWells

Hi Grenville,

Just to clarify, apologies, - I'm unsure which machine you mean I should run it on - I think I only have access to xcslc0, but haven't managed to run any vn8.2 jobs on that yet.


comment:5 Changed 2 years ago by grenville


yes xcs - have you tried to run a previously successful job (exactly as it was configured previously, barring changes needed for the xcs arcitecture)


comment:6 Changed 2 years ago by ChrisWells

Hi Grenville,

Sorry, my problem is that I haven't been able to run any vn8.2 job successfully on xcs; the job I refer to in the initial post is itself a copy of a job which was ran on the earlier xcs machine (xkcxk) - I picked that one to copy because it was ran on a closer machine.

I do see that the error is likely not due to the machine, but am unsure why.


comment:7 Changed 2 years ago by grenville


Please take a copy of my job xoewa - it seems to be going. I juggled a few namcouple files until it worked. I eventually used a namcouple file from dshawk/xlzci - I strongly suggest you examine the output before committing to running this seriously.

Please take copies of files currently residing in other users space - there's no guarantee they won't be deleted.


comment:8 Changed 2 years ago by ChrisWells

Hi Grenville,

Thanks a lot for your help. I've copied that job, copied those files of other users, looked through the job, changed it for my user, and tried to run it, but I get an error which I don't understand in the comp.leave file:

UM__atmosphere__boundary_layer__bl_diags_mod.F90: 161 line(s), 0 auto dependency(ies).
UM__atmosphere__dynamics_advection__ecmwf_quasi_cubic_niv.F90: 222 line(s), 2 auto dependency(ies).
No. of files scanned for dependency: 2920
Generated cfg: /projects/ukca-imp/chwel/xodtf/umatmos/.cache/.bld/.config_dep
/projects/ukca-imp/chwel/xodtf/umatmos/Makefile: updated
->Scan dependency: 64 seconds
->Generate Fortran interface: start
->Generate Fortran interface: 1 second
->Make: start
cd /projects/ukca-imp/chwel/xodtf/umatmos
# Start: 2018-10-23 20:00:05=> make -f /projects/ukca-imp/chwel/xodtf/umatmos/Makefile -j 6 all
make: *** No rule to make target `xodtf.exe', needed by `all'.  Stop.
# Time taken:            0 s=> make -f /projects/ukca-imp/chwel/xodtf/umatmos/Makefile -j 6 all
make -f /projects/ukca-imp/chwel/xodtf/umatmos/Makefile -j 6 all failed (2) at /common/fcm/fcm-2017.10.0/bin/../lib/FCM1/ line 611
cd /scratch/jtmp/pbs.8289648.xcs00.x8z
Build failed on Tue Oct 23 20:00:05 2018.
->Make: 0 second
->TOTAL: 229 seconds
UMATMOS build failed

From looking around it seems I need to add a rule in the Makefile to make the executable, but am unsure what exactly this should look like, or if it is that simple. I think it might be to do with some of the options I changed in Compile and run options - apologies if you didn't mean for me to change them to what they were earlier; I assumed I had to change them to get it to run fully - if it is these that are the problem, then I am unsure what these options do.

Do you know how I can fix this?

Many thanks,

comment:9 Changed 2 years ago by grenville


I just took a copy of xoewa (xoewb) switched on the atmosphere, reconfig, and ocean builds - it built OK and is running (you'll need to fix the number of nodes selected in the job script (it should be 7 not 6) - Willie has addressed this issue earlier).

I can't see any obvious problems with xodtf - please delete /projects/ukca-imp/chwel/xodtf and re-run.

If that fails - take a copy of xoewb, change nothing but the username info. and try running that.


comment:10 Changed 2 years ago by ChrisWells

Hi Grenville,

Thanks for that; I'm taking a look now, but I have one new problem: when I try and submit the run I get a "disk quota exceeded" error - if I run du -csh on /home/ChrisWells, I am only using 1.1G; each FCM_extracts folder is ~300M. Do you know how I can get some more space to be able to submit the job?

Many thanks,

comment:11 Changed 2 years ago by andy

Hi Chris,

I have increased your PUMA quota to 3GB which should help.


comment:12 Changed 2 years ago by ChrisWells

Thanks for that Andy,

Hi Grenville,

I'm afraid I'm having the same issue I had before ("apsched: claim exceeds reservation's node-count") - Willie's solution to the node problem was a new OASIS_conf file which he gave me, but that fix (to my understanding) deals with the number of cores per node (36 vs 32), whereas this issue is with the number of nodes (7 vs 6), and I can't see where this is set in the OASIS_conf file, or any others - where could I find it?

Many thanks,

comment:13 Changed 2 years ago by grenville


A bit of a hack; edit /home/d00/chwel/umui_runs/xodtg-297111505/umuisubmit_run and change

#PBS -l select=6:ompthreads=1
#PBS -l select=7:ompthreads=1

Then at the ARCHER command line type

qsub umuisubmit_run


comment:14 Changed 2 years ago by willie

  • Keywords coupled model added
  • Platform set to Monsoon2
  • UM Version set to 8.2

comment:15 Changed 2 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.