Opened 5 years ago

Closed 5 years ago

#1321 closed error (fixed)

Compile error

Reported by: swr04ojb Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 8.5

Description

xjlen is a HadGAM3 N96L85 v8.5 job on ARCHER.

I copied it, and ran it (xinib) without changes. That worked.

I made a second copy (xinid), and made some small changes to atm_step.F90 to check that I had setup a new fcm branch and working-copy correctly. That job failed to complete its compilation step, with an error message that I could not easily decipher (I will try to attach the log file to this ticket in a moment).

I thought, maybe that's because of a change I made, so I went back to the clean copy (xinib) and changed that from use-the-existing-executable to compile new, and that also failed to complete the compilation step.

Can you advise what I'm missing?

kind regards,

oliver

Change History (6)

comment:1 in reply to: ↑ description Changed 5 years ago by swr04ojb

Hmm, I can't see an 'attach' button. So here's the end of the file:

gmake: *** Waiting for unfinished jobs....
# Time taken:           79 s=> ftn -o ukca_main1.o -I/home/n02/n02/obrowne/build/xinib/umatmos/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/UMATMOS/inc -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/hum/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/obrowne/build/xinib/umatmos/ppsrc/UM/atmosphere/UKCA/ukca_main1-ukca_main1.f90
mv ukca_main1.o /home/n02/n02/obrowne/build/xinib/umatmos/obj
# Time taken:           47 s=> ftn -o iau.o -I/home/n02/n02/obrowne/build/xinib/umatmos/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/UMATMOS/inc -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/hum/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/obrowne/build/xinib/umatmos/ppsrc/UM/atmosphere/AC_assimilation/iau.f90
mv iau.o /home/n02/n02/obrowne/build/xinib/umatmos/obj
# Time taken:          115 s=> ftn -o atmos_physics2.o -I/home/n02/n02/obrowne/build/xinib/umatmos/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/UMATMOS/inc -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/hum/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/obrowne/build/xinib/umatmos/ppsrc/UM/control/top_level/atmos_physics2.f90
mv atmos_physics2.o /home/n02/n02/obrowne/build/xinib/umatmos/obj
LLVM ERROR: IO failure on output stream.
fcm_internal compile failed (256)
# Time taken:          193 s=> ftn -o glue_conv_6a_mod.o -I/home/n02/n02/obrowne/build/xinib/umatmos/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/UMATMOS/inc -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/hum/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/obrowne/build/xinib/umatmos/ppsrc/UM/atmosphere/convection/glue_conv-6a.f90
gmake: *** [glue_conv_6a_mod.o] Error 1
# Time taken:          281 s=> ftn -o u_model.o -I/home/n02/n02/obrowne/build/xinib/umatmos/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/JULES/inc -I/home/n02/n02/obrowne/build/xinib/baserepos/UMATMOS/inc -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -I /work/n02/n02/hum/gcom/cce/gcom4.5/archer_cce_mpp/inc     -h omp -c /home/n02/n02/obrowne/build/xinib/umatmos/ppsrc/UM/control/top_level/u_model.f90
mv u_model.o /home/n02/n02/obrowne/build/xinib/umatmos/obj
# Time taken:          812 s=> gmake -f /home/n02/n02/obrowne/build/xinib/umatmos/Makefile -j 6 all
gmake -f /home/n02/n02/obrowne/build/xinib/umatmos/Makefile -j 6 all failed (2) at /fs2/n02/n02/hum/fcm-2014-02/bin/../lib/FCM1/Build.pm line 611
cd /home2/n02/n02/obrowne
Build failed on Thu Jul  3 14:48:01 2014.
->Make: 812 seconds
->TOTAL: 1029 seconds
UMATMOS build failed
--------------------------------------------------------------------------------

Resources requested: ncpus=1,place=free,walltime=04:00:00
Resources allocated: cpupercent=428,cput=00:57:45,mem=1647876kb,ncpus=1,vmem=2183288kb,walltime=00:17:15

*** obrowne   Job: 381840.sdb   ended: 03/07/14 14:48:02   queue: serial ***
*** obrowne   Job: 381840.sdb   ended: 03/07/14 14:48:02   queue: serial ***
*** obrowne   Job: 381840.sdb   ended: 03/07/14 14:48:02   queue: serial ***
*** obrowne   Job: 381840.sdb   ended: 03/07/14 14:48:02   queue: serial ***
--------------------------------------------------------------------------------

and I've copied both .comp.leave files to puma, so you can see them at either..

puma$ /home/swr04ojb/xini*.comp.leave
archer$ /home/obrowne/output/xini*.comp.leave

comment:2 Changed 5 years ago by ros

Hi Oliver,

We can see your output files on ARCHER so you just need to tell us the file names, so no need to move them around.

xjlen does include a branch that I know will cause a compile failure, although the one above is not familiar to me. I'm guessing the job you want is xjleo, but will confirm with Karthee first and get back to you.

Cheers,
Ros.

comment:3 Changed 5 years ago by grenville

Oliver

The error "LLVM ERROR: IO failure on output stream" is new to me. Googling it makes some reference to full file systems - although I don't think this is the problem, I have increased your work quota.

Grevnille

comment:4 Changed 5 years ago by swr04ojb

Hi,

Ros → thanks for clarifying that, won't move those again. And thank you for chasing Karthee for me.

Grenville → when I run "quota -s" I get told that I am using just under 2 of 10Gb. Which seems lots, and when I check my /work via "lfs quota -u obrowne /work" I appear to be using 100 of 500Gb. Both seem plenty? But perhaps the work one was low before (I know the home one was okay) so thanks for expanding the workspace.

I just tried resubmitting the simple copy (xinib) and it compiled okay now. I assume it was something to do with space. I'm trying the other job (xinid) again now…

Ros → do let me know though if you think I should be using xjleo instead.

Oliver

Last edited 5 years ago by swr04ojb (previous) (diff)

comment:5 Changed 5 years ago by ros

Hi Oliver,

xjleo is the job you should copy, this has been ported to ARCHER and been tested by several users.

Cheers,
Ros.

comment:6 Changed 5 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.