Opened 5 weeks ago

Closed 6 days ago

#2641 closed help (fixed)

ITD cleanup error in step_therm2

Reported by: ChrisWells Owned by: um_support
Priority: normal Component: UM Model
Keywords: coupled model Cc:
Platform: Monsoon2 UM Version: 8.2

Description

Hi,

(Following from http://cms.ncas.ac.uk/ticket/2600) - I'm running the UM at vn8.2 on xcs, so have been having problems getting it to fit together.

The model now runs for a short time but fails in routine step_therm2, with the error "ITD cleanup error in step_therm2" - I'm unsure what this is. Do you know how I can fix this?

Many thanks,
Chris

Change History (15)

comment:1 Changed 5 weeks ago by grenville

Chris

The model is complaining about inconsistent ice and snow energies (see "zerolayer check - wrong ice energy" messages in the leave file.)

Are you sure that the start data is consistent and appropriate - you say the job has run on xcml00; do you have the log files from that run?

Are you in contact with the owner of the job you copied?

Grenville

comment:2 Changed 5 weeks ago by ChrisWells

Hi Grenville,

The previous owner is in my group so I am in contact; the start dumps are all in /projects/ukca-imp/mkasoa/ukca-projects/start_dumps/

Atmos: xjndaa.da20501201_00_fix

Ocean: anqdho_19991201_restart.nc

CICE: anqdhi.restart.1999-12-01-00000_reset

These have worked before with start date 01 Dec 1999 (which I am trying to use too). The NEMO and CICE dumps should be consistent because they came from the same simulation.

We don’t still have any of the log files from the original coupled runs unfortunately, but we do have the log files from some coupled runs which used mostly the same set-up as the copied runs (including the same NEMO and CICE setup), although the actual start dumps used might be have been different.

e.g. for job xlzci, the .leave files are in ~dshawk/output/, and the processor output files are in /projects/ukca-imp/dshawk/xlzci/pe_output/).

Hope this is useful - is there anything else I can provide which might help?

Many thanks,
Chris

comment:3 Changed 4 weeks ago by grenville

Chris

Could you try running an exact copy of xlzci? If that succeeds, we'll have a better idea of where to start debugging your run.

Grenville

comment:4 Changed 4 weeks ago by ChrisWells

Hi Grenville,

Just to clarify, apologies, - I'm unsure which machine you mean I should run it on - I think I only have access to xcslc0, but haven't managed to run any vn8.2 jobs on that yet.

Cheers,
Chris

comment:5 Changed 4 weeks ago by grenville

Chis

yes xcs - have you tried to run a previously successful job (exactly as it was configured previously, barring changes needed for the xcs arcitecture)

Grenville

comment:6 Changed 4 weeks ago by ChrisWells

Hi Grenville,

Sorry, my problem is that I haven't been able to run any vn8.2 job successfully on xcs; the job I refer to in the initial post is itself a copy of a job which was ran on the earlier xcs machine (xkcxk) - I picked that one to copy because it was ran on a closer machine.

I do see that the error is likely not due to the machine, but am unsure why.

Cheers,
Chris

comment:7 Changed 4 weeks ago by grenville

Chris

Please take a copy of my job xoewa - it seems to be going. I juggled a few namcouple files until it worked. I eventually used a namcouple file from dshawk/xlzci - I strongly suggest you examine the output before committing to running this seriously.

Please take copies of files currently residing in other users space - there's no guarantee they won't be deleted.

Grenville

comment:8 Changed 4 weeks ago by ChrisWells

Hi Grenville,

Thanks a lot for your help. I've copied that job, copied those files of other users, looked through the job, changed it for my user, and tried to run it, but I get an error which I don't understand in the comp.leave file:

UM__atmosphere__boundary_layer__bl_diags_mod.F90: 161 line(s), 0 auto dependency(ies).
UM__atmosphere__dynamics_advection__ecmwf_quasi_cubic_niv.F90: 222 line(s), 2 auto dependency(ies).
No. of files scanned for dependency: 2920
Generated cfg: /projects/ukca-imp/chwel/xodtf/umatmos/.cache/.bld/.config_dep
/projects/ukca-imp/chwel/xodtf/umatmos/Makefile: updated
->Scan dependency: 64 seconds
->Generate Fortran interface: start
->Generate Fortran interface: 1 second
->Make: start
cd /projects/ukca-imp/chwel/xodtf/umatmos
# Start: 2018-10-23 20:00:05=> make -f /projects/ukca-imp/chwel/xodtf/umatmos/Makefile -j 6 all
make: *** No rule to make target `xodtf.exe', needed by `all'.  Stop.
# Time taken:            0 s=> make -f /projects/ukca-imp/chwel/xodtf/umatmos/Makefile -j 6 all
make -f /projects/ukca-imp/chwel/xodtf/umatmos/Makefile -j 6 all failed (2) at /common/fcm/fcm-2017.10.0/bin/../lib/FCM1/Build.pm line 611
cd /scratch/jtmp/pbs.8289648.xcs00.x8z
Build failed on Tue Oct 23 20:00:05 2018.
->Make: 0 second
->TOTAL: 229 seconds
UMATMOS build failed

From looking around it seems I need to add a rule in the Makefile to make the executable, but am unsure what exactly this should look like, or if it is that simple. I think it might be to do with some of the options I changed in Compile and run options - apologies if you didn't mean for me to change them to what they were earlier; I assumed I had to change them to get it to run fully - if it is these that are the problem, then I am unsure what these options do.

Do you know how I can fix this?

Many thanks,
Chris

comment:9 Changed 4 weeks ago by grenville

Chris

I just took a copy of xoewa (xoewb) switched on the atmosphere, reconfig, and ocean builds - it built OK and is running (you'll need to fix the number of nodes selected in the job script (it should be 7 not 6) - Willie has addressed this issue earlier).

I can't see any obvious problems with xodtf - please delete /projects/ukca-imp/chwel/xodtf and re-run.

If that fails - take a copy of xoewb, change nothing but the username info. and try running that.

Grenville

comment:10 Changed 4 weeks ago by ChrisWells

Hi Grenville,

Thanks for that; I'm taking a look now, but I have one new problem: when I try and submit the run I get a "disk quota exceeded" error - if I run du -csh on /home/ChrisWells, I am only using 1.1G; each FCM_extracts folder is ~300M. Do you know how I can get some more space to be able to submit the job?

Many thanks,
Chris

comment:11 Changed 4 weeks ago by andy

Hi Chris,

I have increased your PUMA quota to 3GB which should help.

Cheers
Andy

comment:12 Changed 4 weeks ago by ChrisWells

Thanks for that Andy,

Hi Grenville,

I'm afraid I'm having the same issue I had before ("apsched: claim exceeds reservation's node-count") - Willie's solution to the node problem was a new OASIS_conf file which he gave me, but that fix (to my understanding) deals with the number of cores per node (36 vs 32), whereas this issue is with the number of nodes (7 vs 6), and I can't see where this is set in the OASIS_conf file, or any others - where could I find it?

Many thanks,
Chris

comment:13 Changed 4 weeks ago by grenville

Chris

A bit of a hack; edit /home/d00/chwel/umui_runs/xodtg-297111505/umuisubmit_run and change

#PBS -l select=6:ompthreads=1
to
#PBS -l select=7:ompthreads=1

Then at the ARCHER command line type

qsub umuisubmit_run

Grenville

comment:14 Changed 4 weeks ago by willie

  • Keywords coupled model added
  • Platform set to Monsoon2
  • UM Version set to 8.2

comment:15 Changed 6 days ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.