Opened 8 years ago
Closed 7 years ago
#1116 closed help (fixed)
umui submission failed for STEP=0
Reported by: | jonathan | Owned by: | ros |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | HECToR | |
UM Version: | 6.6.3 |
Description
I tried to submit a UM job from the UMUI at 6.6.3 with STEP=0 in the SUBMIT file. The submission failed. It said the file umuisubmit_clr was missing. This is true; the runs directory contained only umuisubmit_compile. Is this a bug or a mistake of mine?
Thanks
Jonathan
Change History (17)
comment:1 Changed 8 years ago by ros
- Platform changed from <select platform> to HECToR
- UM Version changed from <select version> to 6.6.3
comment:2 Changed 8 years ago by jonathan
Dear Ros
Thanks for the above. Probably I should have found it somewhere on the web - I wasn't sure where to look.
This doesn't seem quite right, though, because /work/n02/n02/hum/vn6.6.3 does not exist on Hector, or am I misunderstanding?
Cheers
Jonathan
comment:3 Changed 8 years ago by ros
- Owner changed from um_support to ros
- Status changed from new to accepted
Hi Jonathan,
Ooops sorry. Cut and paste issues!!
6.6.3 is slightly different it needs to be hg6.6.3 not vn6.6.3.
Probably best just to do this instead….
# Setup UM variables VN=6.6.3 . $UMDIR/hg$VN/$TARGET_MC/scripts/.umsetvars_$VN
CHeers,
Ros.
comment:4 Changed 8 years ago by jonathan
Dear Ros
That works, thanks. I am sure it will solve some problems for me that I hadn't yet encountered, but it doesn't solve the problem I first reported. Maybe umsubmit isn't working properly for STEP=0? With STEP=2 I can submit the job successfully.
Best wishes
Jonathan
comment:5 Changed 8 years ago by ros
Hmmm. Very confused. I've just taken a copy of your xiqrc job and it submits ok both STEP=0 and STEP=2. That points to something in your setup. Other than the .profile stuff I just mentioned the only other pre-requisite for using FCM jobs is that ssh-agent is set up so that you don't need to supply a password for HECToR. I assume you have that set up ok?
Cheers,
Ros.
comment:6 Changed 8 years ago by ros
I think I've got it….
You're missing a
export UMDIR=/work/n02/n02/hum
before the TARGET_MC=cce in your HECToR .profile
Cheers,
Ros.
comment:7 Changed 8 years ago by jonathan
Oh, no I'm not!
It appears in a slightly more complicated form
if [[ `hostname` == lms1 ]] then export UMDIR=/work/n02/n02/simon/lms/ps/um else export UMDIR=/work/n02/n02/hum module load netcdf fi
leading to
hector-xe6-3$ echo $UMDIR /work/n02/n02/hum
It's interesting that you don't have the same problem with STEP=0.
Cheers
Jonathan
comment:8 Changed 8 years ago by jonathan
Dear Ros
I am trying to repeat the test NRUN of one month without recompilation. I switched off compilation and asked it to run from the existing executable, but it fails to submit:
FCM_MAIN: Calling Extract ... Base extract: failed See extract output file /home/jonathan/um/um_extracts/xiqrc/umbase/ext.out
which says
[FAIL] /home/jonathan/umui_jobs/xiqrc/FCM_UMUI_BASE_CFG: cannot locate config file, abort at /home/um/fcm/bin/../lib/Fcm/ConfigSystem.pm line 539.
What have I done wrong? Perhaps it is not permitted to have the compilation on /work instead of /home? I have changed nothing from the STEP=2 run, which worked, except to switch off compilation, as far as I know.
Thanks again
Jonathan
comment:9 Changed 8 years ago by ros
Hi Jonathan,
It's because you have compilation of the reconfiguration switched on in Compilations and modifications → mods for the reconfiguration. But in Atmosphere → Ancillary and input data files → Start dump you have "using the reconfiguration switched off". So it has got a little bit confused and there is unfortunately no cross-checking done on this in the UMUI.
So assuming you don't want to run reconfiguration. Simply turning off compilation of reconfiguration should fix your problem.
Cheers,
Ros.
comment:10 Changed 8 years ago by jonathan
Dear Ros
Thanks, that did the trick. It's funny that it wasn't a problem the first time, when I did compile-and-run (STEP=2). It's only a problem with run-only (STEP=4).
In UM 4.5 you don't have to switch off compilation of the recon because you aren't using it. I suppose it is a waste of time, but it doesn't cause a failure.
Cheers
Jonathan
comment:11 Changed 8 years ago by jonathan
Me again. Not really the same problem but I hope you don't mind my carrying on in the same ticket.
I switched on automatic postprocessing in order to have deletion of superseded dumps. I didn't switch on archiving but nonetheless it failed because
/work/n02/n02/gregoryj/xiqrc/bin/qsexecute[784]: /usr/sbin/mknod: not found [No such file or directory] qsexecute: mknod fails to create pipe to archiving system
Is postprocessing altogether unavailable in 6.6.3? Do people tidy up their superseded dumps manually? Grenville told me archiving was currently unavailable, which of course I would like too The advantages of the archive system are (1) halving the space requirement (2) putting the files into a convenient form for analysis (i.e. .pp) (3) having only "finished" files in the archive directory. It's awkward to pick them up automatically from $DATAW because they might not be the final version.
Cheers
Jonathan
comment:12 Changed 8 years ago by ros
Hi Jonathan,
I think if you include the branch fcm:um_br/dev/jeff/HG6.6.3_hector_monsoon_archiving/src you should find it will delete superseded dumps for you. As you have said automatic archiving is currently unavailable. You will need to switch on "Enable build of UM scripts" in window compilations and modifications → UM Scripts build to force the system to re-extract the UM scripts as you are running from an existing executable.
Regards,
Ros.
comment:13 Changed 8 years ago by jonathan
Dear Ros
I put it in as a script mod. Was that correct? I believe I enabled the script build. This doesn't seem to have worked, though. Can you see what I did wrong? Although it did not delete the superseded dumps, it did create a request file which, however, appears to have faulty contents, namely "59cb0 DELETE". That's only part of the name of one of the files it should have deleted.
Sorry about all these questions. If it can't be made it work, I can live without it of course. It's nice that the job itself is working!
Cheers
Jonathan
comment:14 Changed 8 years ago by grenville
Dear Jonathan
The branch fcm:um_br/dev/jeff/HG6.6.3_hector_monsoon_archiving/src needs to be added to the branch table (labelled User Modifications), not the script mod table (labelled Central Script Modifications).
I successfully ran a test job with the branch.
Grenville
comment:15 Changed 8 years ago by grenville
Dear Jonathan
I updated fcm:um_br/dev/jeff/HG6.6.3_hector_monsoon_archiving so that it now allows for automatic archiving to the RDF (the /nerc disc). You will need to rebuild to pick up the changes.
To use archiving to /nerc, please set up ssh keys for the HECToR job-launcher node to communicate without the need for a passphrase with the lms, please see section 5 on http://cms.ncas.ac.uk/wiki/Hector/NercArchiving for details and the required naming of the public key.
Regards
Grenville
comment:16 Changed 8 years ago by ros
Just to note that the original problem where Jonathan was switching from STEP=0 to STEP=2 was due to doing this with hand-edits to the SUBMIT file rather than via the UMUI. With FCM it is not possible to simply change STEP=0 in the SUBMIT script. There are a couple of other flags in FCM_EXTR_SCR that also need changing.
comment:17 Changed 7 years ago by ros
- Resolution set to fixed
- Status changed from accepted to closed
Hi Jonathan,
I think the problem is because your .profile on HECToR isn't set up correctly.
Make sure you have the following in your .profile and try again.
Regards,
Ros.