Opened 5 years ago
Closed 5 years ago
#1794 closed help (answered)
Jobs not finishing compilation on MONSooN
Reported by: | csteadman | Owned by: | ros |
---|---|---|---|
Component: | UKCA | Keywords: | compiling, compilation |
Cc: | Platform: | MONSooN | |
UM Version: | 8.4 |
Description
Hello,
Four out of the eight jobs I've submitted to MONSooN since Monday haven't finished compiling, but when I resubmit the jobs, they compile, reconfigure, and run fine.
The comp.leave files are all about the same size. I've tried 1) deleting um_extracts/jobID, reprocessing, and submitting and 2) simply resubmitting the job (not deleting um_extracts or reprocessing). I'm not exceeding the quota on puma.
These are the four comp.leave files where the job didn't finish compiling:
xmkfb000.xmkfb.d16025.t132202.comp.leave
xmkfc000.xmkfc.d16025.t144425.comp.leave
xmkfd000.xmkfd.d16026.t233816.comp.leave
xmkfe000.xmkfe.d16027.t000937.comp.leave
Thanks for your help.
Claudia
Change History (5)
comment:1 Changed 5 years ago by csteadman
comment:2 Changed 5 years ago by grenville
Claudia
Odd to get the undefined references when the object files seem to be OK (judging from the successful ftn commands) - we don't know why the link stage failed. Clearly, this isn't reproducible behaviour, which makes it too difficult to figure out. If you do see this regularly, let us know.
Grenville
comment:3 Changed 5 years ago by csteadman
Hi Grenville,
Thanks for your reply. Is there a particular line in the comp.leave file I should look (grep) for, to know that I should just resubmit (rather than spend time trying to figure out what I need to change to fix the code so that it compiles correctly, when I actually don't need to change anything)?
You mentioned the link stage — which line in the leave file corresponds to that?
These are the four comp.leave files where the job didn't finish compiling:
xmkfb000.xmkfb.d16025.t132202.comp.leave
xmkfc000.xmkfc.d16025.t144425.comp.leave
xmkfd000.xmkfd.d16026.t233816.comp.leave
xmkfe000.xmkfe.d16027.t000937.comp.leave
Thank you,
Claudia
comment:4 Changed 5 years ago by ros
- Owner changed from um_support to ros
- Status changed from new to accepted
Hi Claudia,
It's difficult to advise on when to know if a resubmit will fix a problem. If you get messages like "no rule to build target…" then resubmitting and doing a full build often fixes this. In general if you get any weirdness that you don't think is due to a change you've made it's worth just resubmitting and/or forcing a full rebuild.
The link line looks like this:
ftn -o xkzpb.exe /home/n02/n02/claudia/um/xkzpb/ummodel/obj/flumemain.o /home/n02/n02/claudia/um/xkzpb/ummodel/obj/blkdata.o -L/home/n02/n02/claudia/um/xkzpb/ummodel/lib -L/home /n02/n02/claudia/um/xkzpb/umbase/lib -l__fcm__xkzpb -L. -L /work/n02/n02/hum/gcom/cce/gcom3.8/lib -Wl,--warn-unresolved-symbols -Wl,-z,muldefs -s real64 -s integer64 -lgcom_buffered_mpi -L /work/n02/n02/hum/l ib/cce -hsystem_alloc
Cheers,
Ros.
comment:5 Changed 5 years ago by ros
- Resolution set to answered
- Status changed from accepted to closed
To clarify, when I resubmit the jobs they work fine. However, I'd like to know why they didn't finish compiling the first time I submitted them.
Thanks,
Claudia