Opened 6 months ago

Closed 4 months ago

#2396 closed help (fixed)

Job failing at archive stage

Reported by: a.elvidge Owned by: um_support
Priority: normal Component: UM Model
Keywords: archiving, rose_arch Cc:
Platform: Monsoon2 UM Version: 10.6

Description

Hi,

I am just getting started using Monsoon again.
My test job u-au901 is getting all the way to the archiving stage, then failing. It seems to me that no data is being produced locally (despite setting up STASH to do so), hence the failure when attempting to send to MASS. Here is the error message:

[FAIL] moo put -F -c umpp /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpHc7QZ7/20151125T0000Z_IGP_4p0_GA7_pa000.pp /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpHc7QZ7/20151125T0000Z_IGP_4p0_GA7_pb000.pp /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpHc7QZ7/20151125T0000Z_IGP_4p0_GA7_pc000.pp moose:/devfc/u-au901/field.pp/ # return-code=2, stderr=
[FAIL] put command-id=497626574 failed: (SSC_TASK_REJECTION) one or more tasks are rejected.
[FAIL]   /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpHc7QZ7/20151125T0000Z_IGP_4p0_GA7_pa000.pp -> moose:/devfc/u-au901/field.pp/20151125T0000Z_IGP_4p0_GA7_pa000.pp: (TSSC_SET_DOES_NOT_EXIST) no such data set.
[FAIL] put: failed (2)
[FAIL] ! moose:/devfc/u-au901/field.pp/ [compress=None, t(init)=2018-02-12T15:19:43Z, dt(tran)=0s, dt(arch)=2s, ret-code=2]
[FAIL] !	20151125T0000Z_IGP_4p0_GA7_pa000.pp (umnsaa_pa000)
[FAIL] !	20151125T0000Z_IGP_4p0_GA7_pb000.pp (umnsaa_pb000)
[FAIL] !	20151125T0000Z_IGP_4p0_GA7_pc000.pp (umnsaa_pc000)

Any help much appreciated.

Cheers, Andy

Change History (16)

comment:1 Changed 6 months ago by willie

Hi Andy,

Do you have permission to write to this:

frmy@xcs-c$ moo ls  moose:/devfc/u-au901
ls command-id=498175736 failed: (SSC_TASK_REJECTION) one or more tasks are rejected.
  moose:/devfc/u-au901: (TSSC_SET_DOES_NOT_EXIST) no such data set.

Did the install_cold app fail? It should've created this.

Regards
Willie

comment:2 Changed 6 months ago by a.elvidge

Hi Willie,

Ah yes, I hadn't noticed that, but when I change my arch location to an accessible location, I still get an error:

[FAIL] moo put -F -c umpp /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpZo88dR/20151125T0000Z_IGP_4p0_GA7_pb000.pp /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpZo88dR/20151125T0000Z_IGP_4p0_GA7_pa000.pp /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpZo88dR/20151125T0000Z_IGP_4p0_GA7_pc000.pp moose:/adhoc/projects/accacia/aelvidge/u-au901/field.pp/ # return-code=2, stderr=
[FAIL] put command-id=499302094 failed: (SSC_TASK_REJECTION) one or more tasks are rejected.
[FAIL]   /home/d00/aelvidge/cylc-run/u-au901/work/20151125T0000Z/IGP_4p0_GA7_archive/tmpZo88dR/20151125T0000Z_IGP_4p0_GA7_pb000.pp -> moose:/adhoc/projects/accacia/aelvidge/u-au901/field.pp: (TSSC_IS_NOT_DIRECTORY) target does not resolve to a directory.
[FAIL] put: failed (2)
[FAIL] ! moose:/adhoc/projects/accacia/aelvidge/u-au901/field.pp/ [compress=None, t(init)=2018-02-15T10:47:06Z, dt(tran)=0s, dt(arch)=1s, ret-code=2]
[FAIL] !	20151125T0000Z_IGP_4p0_GA7_pa000.pp (umnsaa_pa000)
[FAIL] !	20151125T0000Z_IGP_4p0_GA7_pb000.pp (umnsaa_pb000)
[FAIL] !	20151125T0000Z_IGP_4p0_GA7_pc000.pp (umnsaa_pc000)
2018-02-15T10:47:08Z CRITICAL - Task job script received signal EXIT

It is true that moose:/adhoc/projects/accacia/aelvidge/u-au901/field.pp is not yet a directory - but I'd have thought this directory should have been created? Note that moose:/adhoc/projects/accacia/aelvidge/ does exist, and I have permission to write here.

Any help much appreciated.

Cheers, Andy

comment:3 Changed 6 months ago by willie

Hi Andy,

Monsoon is having issues with MASS at the moment - see the Monsoon collaboration channel on Yammer.

Willie

comment:4 Changed 5 months ago by willie

Hi Andy,

Is this still an issue?

Regards
Willie

comment:5 Changed 5 months ago by a.elvidge

Hi Willie,
Yes, this is still an issue. My job u-au328 is failing with this same error message.
Cheers, Andy

comment:6 Changed 5 months ago by willie

  • Keywords archiving, rose_arch added
  • Platform set to Monsoon2
  • Type changed from error to help

Hi Andy,

You need to add the default command to the install_cold app and insert

moo mkset moose:/devfc/$ROSE_SUITE_NAME || true

as the command.

Last edited 5 months ago by willie (previous) (diff)

comment:7 Changed 5 months ago by a.elvidge

deleted

Last edited 5 months ago by a.elvidge (previous) (diff)

comment:8 Changed 5 months ago by a.elvidge

Hi Willie,

Thanks for your reply, but I'm not entirely sure what you mean. What is the default command? Could you please clarify?

Thanks, Andy

Last edited 5 months ago by a.elvidge (previous) (diff)

comment:9 Changed 5 months ago by willie

Hi Andy,

In the Rose GUI go to install_cold and expand it. The 'command' section is dimmed. Double click on it. Next to command default press the '+' sign and select 'add to configuration'. Then enter the command above in the box.

Regards
Willie

comment:10 Changed 5 months ago by a.elvidge

Hi Willie,

I don't see the command section after expanding install_cold - I see only env. And in the box to right I see all the files found in the install_cold/opt/ directory. This is a nesting suite job. Perhaps the set up is different to usual?

THanks, Andy

comment:11 Changed 5 months ago by willie

Hi Andy,

Click the View menu and select 'View latent pages'

Regards
Willie

comment:12 Changed 5 months ago by a.elvidge

deleted

Last edited 5 months ago by a.elvidge (previous) (diff)

comment:13 Changed 5 months ago by a.elvidge

Hi Willie,

Thanks for this, however I'm still unable to get this to work - still getting the same error. Is the location in the command

moo mkset moose:/devfc/$ROSE_SUITE_NAME || true

correct? Shouldn't it be the same location I've set for archiving, i.e.moose:/adhoc/projects/accacia/aelvidge/$ROSE_SUITE_NAME? In any case I've tried this also (as well as moose:/crum/$ROSE_SUITE_NAME, for both this and my archive location) with no luck.

Any further suggestions much appreciated.

Thanks, Andy

Last edited 5 months ago by willie (previous) (diff)

comment:14 Changed 5 months ago by willie

Hi Andy,

You need to use mkdir for unstructured data sets like 'adhoc'; 'devfc' is structured and takes mkset. See the MOOSE user guide at http://collab.metoffice.gov.uk/twiki/bin/viewfile/Static/MASS/monsoon_user_guide.html.

Regards
Willie

comment:15 Changed 5 months ago by a.elvidge

Hi Willie,

Ah yes, thanks. However even with this change it didn't work - although the job directory was created, the field.pp directory was not created. I ended up just manually creating the correct directory, and then running the job. It's now working, though clearly this is a bit of a fudge!

Cheers, Andy

comment:16 Changed 4 months ago by willie

  • Resolution set to fixed
  • Status changed from new to closed

Hi Andy,

Glad you got it to work. Creating the archive structure before running the suite is the right thing to do.

Regards
Willie

Note: See TracTickets for help on using tickets.