Opened 7 months ago

Closed 6 months ago

#3113 closed help (fixed)

UKESM-AMIP pptransfer

Reported by: eelrm Owned by: ros
Component: UKESM Keywords: pptransfer
Cc: Platform: ARCHER
UM Version: 11.1

Description

Hello,

I'm running several suites (e.g., u-bp572) that seem to be stuck on pptransfer (> 4 days for one cycle). I have previously re-triggered these tasks after they failed when xfer2 was restarted. Is this overload, or is there anything else that I can do?

Many thanks,

Lauren

Change History (8)

comment:1 Changed 7 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Lauren,

I would suggest stopping and restarting the suite (rose suite-run --restart). Then retrigger the "stuck" pptransfer task.

Please also make sure that you can ssh from PUMA to dtn02 without prompt for password or passphrase I can see a couple of login failures to dtn02 recently in the suite log file.

We are finding that transfers via xfer2 are running slower than usual at the moment.

Regards,
Ros.

comment:2 Changed 7 months ago by ros

Sorry that should have said rose suite-restart not rose suite-run --restart

comment:3 Changed 7 months ago by eelrm

Hi Ros,

Ok, thank you. I'll give it a go. What option should I use when stopping the suite?

Thanks,

Lauren

comment:4 Changed 7 months ago by ros

Through the cylc GUI select the "Stop now (restart will follow up on orphaned tasks)" option.

comment:5 Changed 7 months ago by eelrm

Hi Ros,

I have stopped and restarted, but pptransfer displays as still running so I can't retrigger. Should I leave it?

Thanks,

Lauren

comment:6 Changed 7 months ago by ros

Hi Lauren,

As far as I can see that task is doing nothing on the RDF as the log files are still datestamped 13th December, so I would kill that pptransfer task (right click and select kill). Then retrigger it.

If that doesn't work, as far as I can see there are no other tasks for this suite running at the moment are there? If that's correct I would stop the suite again, but this time choose the "stop after killing active tasks" option and then do a restart and retrigger.

Hopefully that will kick it back into action.

Cheers,
Ros.

comment:7 Changed 7 months ago by eelrm

Hi Ros,

Killing and re-triggering has worked and the suite has now succeeded. I'll proceed with the other suites.

Many thanks,

Lauren

comment:8 Changed 6 months ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.