Opened 7 years ago

Closed 6 years ago

#1116 closed help (fixed)

umui submission failed for STEP=0

Reported by: jonathan Owned by: ros
Component: UM Model Keywords:
Cc: Platform: HECToR
UM Version: 6.6.3

Description

I tried to submit a UM job from the UMUI at 6.6.3 with STEP=0 in the SUBMIT file. The submission failed. It said the file umuisubmit_clr was missing. This is true; the runs directory contained only umuisubmit_compile. Is this a bug or a mistake of mine?

Thanks

Jonathan

Change History (17)

comment:1 Changed 7 years ago by ros

  • Platform changed from <select platform> to HECToR
  • UM Version changed from <select version> to 6.6.3

Hi Jonathan,

I think the problem is because your .profile on HECToR isn't set up correctly.

Make sure you have the following in your .profile and try again.

export UMDIR=/work/n02/n02/hum
TARGET_MC=cce

# Setup UM variables
VN=6.6.3
if test -f $HOME/.umsetvars_$VN; then
  . $HOME/.umsetvars_$VN
else
  . $UMDIR/vn$VN/$TARGET_MC/scripts/.umsetvars_$VN
fi

Regards,
Ros.

comment:2 Changed 7 years ago by jonathan

Dear Ros

Thanks for the above. Probably I should have found it somewhere on the web - I wasn't sure where to look.

This doesn't seem quite right, though, because /work/n02/n02/hum/vn6.6.3 does not exist on Hector, or am I misunderstanding?

Cheers

Jonathan

comment:3 Changed 7 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Jonathan,

Ooops sorry. Cut and paste issues!!

6.6.3 is slightly different it needs to be hg6.6.3 not vn6.6.3.

Probably best just to do this instead….

# Setup UM variables
VN=6.6.3
. $UMDIR/hg$VN/$TARGET_MC/scripts/.umsetvars_$VN

CHeers,
Ros.

comment:4 Changed 7 years ago by jonathan

Dear Ros

That works, thanks. I am sure it will solve some problems for me that I hadn't yet encountered, but it doesn't solve the problem I first reported. Maybe umsubmit isn't working properly for STEP=0? With STEP=2 I can submit the job successfully.

Best wishes

Jonathan

comment:5 Changed 7 years ago by ros

Hmmm. Very confused. I've just taken a copy of your xiqrc job and it submits ok both STEP=0 and STEP=2. That points to something in your setup. Other than the .profile stuff I just mentioned the only other pre-requisite for using FCM jobs is that ssh-agent is set up so that you don't need to supply a password for HECToR. I assume you have that set up ok?

Cheers,
Ros.

comment:6 Changed 7 years ago by ros

I think I've got it….

You're missing a

export UMDIR=/work/n02/n02/hum

before the TARGET_MC=cce in your HECToR .profile

Cheers,
Ros.

comment:7 Changed 7 years ago by jonathan

Oh, no I'm not! :-)

It appears in a slightly more complicated form

if [[ `hostname` == lms1 ]]
  then export UMDIR=/work/n02/n02/simon/lms/ps/um
  else export UMDIR=/work/n02/n02/hum
  module load netcdf
fi

leading to

hector-xe6-3$ echo $UMDIR
/work/n02/n02/hum

It's interesting that you don't have the same problem with STEP=0.

Cheers

Jonathan

comment:8 Changed 7 years ago by jonathan

Dear Ros

I am trying to repeat the test NRUN of one month without recompilation. I switched off compilation and asked it to run from the existing executable, but it fails to submit:

FCM_MAIN: Calling Extract ...
Base extract: failed
See extract output file /home/jonathan/um/um_extracts/xiqrc/umbase/ext.out

which says

[FAIL] /home/jonathan/umui_jobs/xiqrc/FCM_UMUI_BASE_CFG: cannot locate config file, abort at /home/um/fcm/bin/../lib/Fcm/ConfigSystem.pm line 539.

What have I done wrong? Perhaps it is not permitted to have the compilation on /work instead of /home? I have changed nothing from the STEP=2 run, which worked, except to switch off compilation, as far as I know.

Thanks again

Jonathan

comment:9 Changed 7 years ago by ros

Hi Jonathan,

It's because you have compilation of the reconfiguration switched on in Compilations and modifications → mods for the reconfiguration. But in Atmosphere → Ancillary and input data files → Start dump you have "using the reconfiguration switched off". So it has got a little bit confused and there is unfortunately no cross-checking done on this in the UMUI.

So assuming you don't want to run reconfiguration. Simply turning off compilation of reconfiguration should fix your problem.

Cheers,
Ros.

comment:10 Changed 7 years ago by jonathan

Dear Ros

Thanks, that did the trick. It's funny that it wasn't a problem the first time, when I did compile-and-run (STEP=2). It's only a problem with run-only (STEP=4).

In UM 4.5 you don't have to switch off compilation of the recon because you aren't using it. I suppose it is a waste of time, but it doesn't cause a failure.

Cheers

Jonathan

comment:11 Changed 7 years ago by jonathan

Me again. Not really the same problem but I hope you don't mind my carrying on in the same ticket.

I switched on automatic postprocessing in order to have deletion of superseded dumps. I didn't switch on archiving but nonetheless it failed because

/work/n02/n02/gregoryj/xiqrc/bin/qsexecute[784]: /usr/sbin/mknod: not found [No such file or directory]
qsexecute: mknod fails to create pipe to archiving system

Is postprocessing altogether unavailable in 6.6.3? Do people tidy up their superseded dumps manually? Grenville told me archiving was currently unavailable, which of course I would like too :-) The advantages of the archive system are (1) halving the space requirement (2) putting the files into a convenient form for analysis (i.e. .pp) (3) having only "finished" files in the archive directory. It's awkward to pick them up automatically from $DATAW because they might not be the final version.

Cheers

Jonathan

comment:12 Changed 7 years ago by ros

Hi Jonathan,

I think if you include the branch fcm:um_br/dev/jeff/HG6.6.3_hector_monsoon_archiving/src you should find it will delete superseded dumps for you. As you have said automatic archiving is currently unavailable. You will need to switch on "Enable build of UM scripts" in window compilations and modifications → UM Scripts build to force the system to re-extract the UM scripts as you are running from an existing executable.

Regards,
Ros.

comment:13 Changed 7 years ago by jonathan

Dear Ros

I put it in as a script mod. Was that correct? I believe I enabled the script build. This doesn't seem to have worked, though. Can you see what I did wrong? Although it did not delete the superseded dumps, it did create a request file which, however, appears to have faulty contents, namely "59cb0 DELETE". That's only part of the name of one of the files it should have deleted.

Sorry about all these questions. If it can't be made it work, I can live without it of course. It's nice that the job itself is working!

Cheers

Jonathan

comment:14 Changed 7 years ago by grenville

Dear Jonathan

The branch fcm:um_br/dev/jeff/HG6.6.3_hector_monsoon_archiving/src needs to be added to the branch table (labelled User Modifications), not the script mod table (labelled Central Script Modifications).

I successfully ran a test job with the branch.

Grenville

comment:15 Changed 7 years ago by grenville

Dear Jonathan

I updated fcm:um_br/dev/jeff/HG6.6.3_hector_monsoon_archiving so that it now allows for automatic archiving to the RDF (the /nerc disc). You will need to rebuild to pick up the changes.

To use archiving to /nerc, please set up ssh keys for the HECToR job-launcher node to communicate without the need for a passphrase with the lms, please see section 5 on http://cms.ncas.ac.uk/wiki/Hector/NercArchiving for details and the required naming of the public key.

Regards

Grenville

comment:16 Changed 6 years ago by ros

Just to note that the original problem where Jonathan was switching from STEP=0 to STEP=2 was due to doing this with hand-edits to the SUBMIT file rather than via the UMUI. With FCM it is not possible to simply change STEP=0 in the SUBMIT script. There are a couple of other flags in FCM_EXTR_SCR that also need changing.

comment:17 Changed 6 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.