Opened 6 years ago

Closed 5 years ago

#1145 closed help (completed)

having trouble running a um job adapted from tim graham (tigrah)

Reported by: bs Owned by: annette
Component: UM Model Keywords: hadgem3, coupled model, CRUN, moose, archiving
Cc: Platform: MONSooN
UM Version: 8.0

Description

This is a HadGEM3 Job ported to monsoon by Jeremy Walton and I think it has been run on Monsoon. When I try to run it Puma says the job is submitted, but there is no entry on the job queue on monsoon and no output file either, although the job files are copied across and so is the um source code - but no executable is compiled. Hope you can help, the job identifier is xizqa

Change History (12)

comment:1 Changed 6 years ago by ros

Hi Bablu,

I think it is failing because there is a hand-edit that is forcing the job to run under the "umadmin" account which you do not have permission to do.

Please remove the hand-edit:

 ~jwalton/handedits/monsoon/umadmin_group.ed

Then in window "User information and submit method" → "General details"
Select "Override Met office default account group"
Select "External" and then enter your MONSooN account group, which I assume is "nemo".

Hopefully that will solve the problem.

CHeers,
Ros.

comment:2 Changed 6 years ago by bs

Hi Ros, the model is in the queue and compiling, so looks promising! I will let you know if it runs and restarts OK, and hopefully we can close the ticket, many thanks for your help so far.

Bablu

comment:3 Changed 6 years ago by bs

Hi Ros, compiles and runs for a nrun, but dies when I switch to an crun - citing that the right nemo files aren't there (had this problem with the ¼ degree version I have run). Output file in /home/basinh/output/xizqa000.xizqa.d13281.t151358.leave.

Best wishes, Bablu

comment:4 Changed 6 years ago by annette

Hi Bablu,

It looks like something has gone wrong with the scripts that set up the UM job. I will just check with Jeremy whether he tested CRUNs.

(I think the error about NEMO files is a bit of a red-herring referring to the fact that there was no output written for this job.)

Annette

comment:5 Changed 6 years ago by annette

  • Keywords hadgem3, coupled model, CRUN added
  • Owner changed from um_support to annette
  • Status changed from new to assigned

comment:6 Changed 6 years ago by annette

Hi Bablu,

Jeremy hadn't tried running a CRUN with this job so he's going to look into it for you. I will be in touch when I hear back from him.

Annette

comment:7 Changed 6 years ago by bs

Hi Annette, great, many thanks for looking into this, Bablu

comment:8 Changed 6 years ago by bs

Hi Annette, Did Jeremy get anywhere with this? It would be good to be able to run more than 1 month at a time! Bablu

comment:9 Changed 6 years ago by annette

  • Keywords CRUN, moose, archiving added; CRUN removed

Response from Jeremey Walton:


Hi Bablu,

My apologies for the delay here; fixing this xipvd problem really has taken longer than it should, partly because of my being out of the office, and partly because it was a tricky problem to track down. Anyway - here's what I've found.

The initial error message was

cp: /u/m20/cprod/opstartinfo/xizqa-281161347: No such file or directory

which is complaining about trying to copy from a non-existent location. The reason the job gets to this point is, however, nothing to do with what it's trying to do here; instead, it's fallen through to this point because its attempt to start archiving its results has failed.

The job tries to start archiving by first making a MOOSE data set in MASS using the command:

moo mkset -v moose:crum/xipvd

This command works on our internal hpc systems, but fails on MONSooN because each data set which is archived from that machine has to be associated with a project. Hence, this error message:

mkset command-id=121170058 failed: (SSC_TASK_REJECTION) one or more tasks are rejected.
moose:/crum/xipvd: (TSSC_PROJECT_NAME_REQUIRED) A project name must be specified.
mkset: failed (2)

More information on this problem is given at: http://collab.metoffice.gov.uk/twiki/bin/view/Support/MooseDataOwnership.

The solution for this job is to

— specify the project name in the panel Post-Processing > Main Switch + General Questions in the "Monsoon project group name" box, and to

— include the script branch fcm:um_br/dev/jeff/VN8.0_hector_monsoon_archiving @ 9649 in FCM Configuration > FCM Options for UM Atmosphere and Reconfiguration.

I've updated xipvd to include this solution (note that the project name I've used is umadmin, which will be different for Bablu), and the job (NRUN & CRUN) now appears to be working correctly (including archiving results).

More specifically, here's what I did:

A) Created the MOOSE data set by hand using the command:

% moo mkset -v -p project-umadmin moose:crum/xipvd

[Bablu: you'll have to replace umadmin with the name of your project, and xipvd with the name of your job]

B) Restarted the xipvd job from scratch by rebuilding, reconfiguring and perfoming an NRUN:

FCM Configuration > FCM Extract directories and Output levels >

Force FULL Extract
Force FULL Build

Compilation and Run Options > Compile options for Atmosphere and Reconfiguration >

Compile Model executable
Compile Reconfiguration executable
Run the model
Run the reconfiguration

Compilation and Run Options > Compile options for NEMO >

Compile and build the executable named below, then run

Compilation and Run Options > Compile options for CICE >

Compile and build the executable named below, then run

Compilation and Run Options > UM Scripts Build >

Enable build of UM scripts

Input/Output Control and Resources > User hand edit files >

don't include ~jwalton/jobfiles/xipvd/handedits/crun.ed

Then save, process and submit.

C) Then performed a CRUN:

Compilation and Run Options > Compile options for Atmosphere and Reconfiguration >

turn off Compile Model executable
turn off Compile Reconfiguration executable
Run the model
turn off Run the reconfiguration

Compilation and Run Options > Compile options for NEMO >

Run from existing executable, as named below

Compilation and Run Options > Compile options for CICE >

Run from existing executable, as named below

Input/Output Control and Resources > User hand edit files >

include ~jwalton/jobfiles/xipvd/handedits/crun.ed

Then save, process and submit.

xipvd is currently running this CRUN, and all seems to be working okay.

Hope this is helpful so far - please let me know if you need any more here,

Cheers,

Jeremy

comment:10 Changed 6 years ago by bs

Hi Jeremy, it turns out I have no moose file in .moosedir - I guess either I lost it or was never given one in the first place - where can I get one from? Cheers, Bablu

comment:11 Changed 6 years ago by annette

Hi Bablu,

This question needs to be directed to MONSooN support - I see you've already done that.

Annette

comment:12 Changed 5 years ago by annette

  • Resolution set to completed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.