Opened 4 years ago

Closed 4 years ago

#1452 closed help (fixed)

tutorial job builds fine but then doesn't run

Reported by: jsabrooke Owned by: ros
Component: UM Model Keywords: tutorial run
Cc: Platform: ARCHER
UM Version: 8.2

Description

Hi, I've been trying the UM tutorial with ARCHER, and I can't seem to run the test job. It seems to build successfully, but doesn't attempt to run the actual model executable. The tutorial points me to $DATAW/pe_output (/work/n02/n02/jsab500/um/jobid/pe_output), but there's only a /bin directory within ./jobid/, which has the executable and some other files inside.

I tried submitted it a second time, and it's the same. I watched the job with qstat and serialJobs, and it finished the build job, and then there was nothing else running for my user. I definitely have the option selected to run the model executable, and I'm on the n02-ncas account that has ~1700 AUs left.

The tutorial also points me to the ~/output directory, and two .leave files. It seems that one of these is for the build and one for the main job. I only have the one for the build, which seemed to finish fine.

I've tried changing a few different settings and lots of submits now, including trying to only run the model executable and not do the build, and also starting again with a fresh job, but nothing helps.

I don;t know if it's relevant, but sometimes when submitting, it does all the exrtact stuff fine, then says "submit failed". If I then just submit again without changing anything, it normally submits fine. Unfortunately I've lost this occasional error messgae, but I don't think it's descriptive at all. When it does submit fine, it says:

Your job directory on host login.archer.ac.uk is: /home/n02/n02/jsab500/umui_runs/xkzdb-029101713

umui_runs/xkzdb-029101713/SUBMIT[29]: .[61]: .[369]: .: line 72: PROMPT_COMMAND: is read only
Total PEs requested: 96
NOTE: The following has been selected for running on the CRAY XC30

96 MPI task(s)
4 node(s)
24 MPI task(s) per node



and it does seem to possibly have some issue with something being read only, could this be the problem, or do you know if there's some obvious error that I'm likely to be making?

thank you very much for your help and sorry to bother you,
James

Change History (3)

comment:1 Changed 4 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi James,

The problem is because the run job is trying to submit to an unknown queue (see end of .comp.leave file). You have a hand-edit in the job in the job that is switching it to use a special reservation queue which was only valid for the UM training courses. Please got to UMUI window Input/Output control and resources → User hand-edit files and switch off the hand by changing the Y to N in the last column.

Regards,
Ros.

comment:2 Changed 4 years ago by jsabrooke

Hi Ros,

thanks very much, it runs fine now. I should have just asked you straight away instead of trying so many things myself…

thanks a lot,
James

comment:3 Changed 4 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.