Opened 4 years ago

Closed 4 years ago

#1489 closed error (answered)

ERROR: COSP in ATM_STEP_4A

Reported by: lsim Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 8.6

Description

Dear CMS,

An error occurs during the first timestep when I try to run some v8.6-GA6.0 jobs:

COSP in ATM_STEP_4A: allocating cosp_crain_3d in tstep 1
COSP in ATM_STEP_4A: allocating cosp_csnow_3d in tstep 1

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!
? Error Code: 9
? Error Message: Input variable out of range, rangevec: 39*0, 48, 48*0
? Error from processor: 3
? Error number: 79
?????????
???????????????????????????????????????????????????????????????????????

The jobs this error occurs for are run with a set of modified ancillary files, using an altered land-sea mask. These jobs include: xkvej and xkvek.

If the orography file is changed, the number of processors which produce this error change. However, even with the normal qrparm.orog file (job xkvek), the same error occurs on some processors e.g. /work/n02/n02/lsim/um/xkvek/pe_output/xkvek.fort6.pe03. Because the error does not occur on pe00, it may not show up in the .leave files (in ~/output).

It seems likely that the error is the result of a problem with some of the other modified ancillary files. Reducing the time step (down to 00:02:30) does not help. Any suggestions re: identifying the problem would be very much appreciated.

Many thanks,
Louise

Change History (7)

comment:1 Changed 4 years ago by grenville

Louise

Could you try switching off cosp

You can switch it off in the UMUI.

→ Model Selection

→ Atmosphere

→ Scientific Parameters and Sections

→ Section by section choices

→ Section 2: LW Radiation

—> COSP

Turn off "Run with COSP"

Grenville

comment:2 Changed 4 years ago by lsim

Hi Grenville,

We wondered if the COSP thing might be indicative of a problem somewhere else in the model. When Peter tried turning off COSP, the model still crashed immediately.

I've tried the same now, but for some reason my version of the job (with switched COSP off) exits immediately (submitted to the short queue) without doing anything at all - xkvek.

Louise

comment:3 Changed 4 years ago by grenville

Louise

I'm looking in /home/n02/n02/lsim/umui_runs/xkvek-057155827 - assuming this is the last one you ran, but it doesn't indicate short queue anywhere.

There must have been some kind of message to indicate why it failed?

Grenville

comment:4 Changed 4 years ago by lsim

Hi Grenville,

Deleted then submitted as qsub -q short umuisubmit_run. Oddly it simply exits about 1 second after it indicates it has started running.

Having has a little hunt around, ~um/umui_out/xkvek000.xkvek.d15057.t155839.leave was produced, which simply contains:

xkvek000.xkvek.d15057.t155839.leave: No such file or directory

The location of these files makes me thing this immediate exit may be to do with my use of two different .profiles. One for running HadGEM3 and the other for HadCM3 jobs. HadCM3 .leave files are supposed to end up in ~/um/umui_out; HadGEM3 in ~/output. I did try to ensure that the correct ~/.profile was applied. But maybe somehow this still caused it to fail. Will try it again tomorrow when the short/debug queue is back up, incase that helps.

If there is a cleaner way to handle running HadCM3 and HadGEM3 jobs, which require different paths and compilers than changing the .profile file from .profile_vn8.6 and back to .profile_vn4.5, as required, that would make life simpler and maybe prevent this problem.

Louise

comment:5 Changed 4 years ago by grenville

Louise

/home/n02/n02/lsim/um/umui_out/xkvek000.xkvek.d15057.t155839.leave

contains the following

/var/spool/PBS/mom_priv/jobs/2726332.sdb.SC[57]: .: /work/n02/n02/hum/vn8.6/normal/scripts/.umsetvars_8.6: cannot open [No such file or directory]

which is likely the source of your problems

Grenville

comment:6 Changed 4 years ago by lsim

Hi Grenville,

Think the .profile maybe did result in wrong .umsetvars_8.6 path. But with the correct .profile the COSP off job does attempt to run, and does at least write some information into the .leave file: ~/output/ xkvek000.xkvek.d15058.t101655.leave e.g.

Negative q at start of convection: i,j,k 7 16 85 q_conv NaN q_n 0.556428647904601147E-06 step 1 0

and

lib-4212 : UNRECOVERABLE library error

Louise

comment:7 Changed 4 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed

Louise

I'll close this now.

Grenville

Note: See TracTickets for help on using tickets.