Opened 4 years ago

Closed 4 years ago

#1794 closed help (answered)

Jobs not finishing compilation on MONSooN

Reported by: csteadman Owned by: ros
Component: UKCA Keywords: compiling, compilation
Cc: Platform: MONSooN
UM Version: 8.4

Description

Hello,

Four out of the eight jobs I've submitted to MONSooN since Monday haven't finished compiling, but when I resubmit the jobs, they compile, reconfigure, and run fine.

The comp.leave files are all about the same size. I've tried 1) deleting um_extracts/jobID, reprocessing, and submitting and 2) simply resubmitting the job (not deleting um_extracts or reprocessing). I'm not exceeding the quota on puma.

These are the four comp.leave files where the job didn't finish compiling:
xmkfb000.xmkfb.d16025.t132202.comp.leave
xmkfc000.xmkfc.d16025.t144425.comp.leave
xmkfd000.xmkfd.d16026.t233816.comp.leave
xmkfe000.xmkfe.d16027.t000937.comp.leave

Thanks for your help.
Claudia

Change History (5)

comment:1 Changed 4 years ago by csteadman

To clarify, when I resubmit the jobs they work fine. However, I'd like to know why they didn't finish compiling the first time I submitted them.

Thanks,
Claudia

comment:2 Changed 4 years ago by grenville

Claudia

Odd to get the undefined references when the object files seem to be OK (judging from the successful ftn commands) - we don't know why the link stage failed. Clearly, this isn't reproducible behaviour, which makes it too difficult to figure out. If you do see this regularly, let us know.

Grenville

comment:3 Changed 4 years ago by csteadman

Hi Grenville,

Thanks for your reply. Is there a particular line in the comp.leave file I should look (grep) for, to know that I should just resubmit (rather than spend time trying to figure out what I need to change to fix the code so that it compiles correctly, when I actually don't need to change anything)?

You mentioned the link stage — which line in the leave file corresponds to that?

These are the four comp.leave files where the job didn't finish compiling:
xmkfb000.xmkfb.d16025.t132202.comp.leave
xmkfc000.xmkfc.d16025.t144425.comp.leave
xmkfd000.xmkfd.d16026.t233816.comp.leave
xmkfe000.xmkfe.d16027.t000937.comp.leave

Thank you,
Claudia

comment:4 Changed 4 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Claudia,

It's difficult to advise on when to know if a resubmit will fix a problem. If you get messages like "no rule to build target…" then resubmitting and doing a full build often fixes this. In general if you get any weirdness that you don't think is due to a change you've made it's worth just resubmitting and/or forcing a full rebuild.

The link line looks like this:

ftn -o xkzpb.exe /home/n02/n02/claudia/um/xkzpb/ummodel/obj/flumemain.o /home/n02/n02/claudia/um/xkzpb/ummodel/obj/blkdata.o -L/home/n02/n02/claudia/um/xkzpb/ummodel/lib -L/home
/n02/n02/claudia/um/xkzpb/umbase/lib -l__fcm__xkzpb -L. -L /work/n02/n02/hum/gcom/cce/gcom3.8/lib -Wl,--warn-unresolved-symbols -Wl,-z,muldefs -s real64 -s integer64 -lgcom_buffered_mpi -L /work/n02/n02/hum/l
ib/cce -hsystem_alloc

Cheers,
Ros.

comment:5 Changed 4 years ago by ros

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.