#3139 closed help (completed)

u-bq336: suite clogs my archer processes

Reported by: xd904476 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version:


I am running this suite with 30 ensemble with a limit on the processes that can go on together in suite.rc
{% if SITE == 'archer' %}



limit = 10
members = COUPLED


limit = 4

{% endif %}

This doesn't seem to be enough: the suite submits all the coupled tasks right after the perturbations and even before the recon.
This is generally ok with only a few ensembles, but with this number, all the jobs on archer are now clogged. I have set all the coupled tasks to "failed" to run the perturbations and then the recon manually.
I'll try to sort this out for this case, but could you please tell me how to add some "order" in suite.rc so that the RECON tasks is only triggered after all the perturbations have succeeded?

Change History (7)

comment:1 Changed 14 months ago by xd904476

Update: I have manually created all the perturbed initial conditions and I have retriggered the RECOn task.
The suite is now not recognising ASTART and I also I am not sure that it is reading the right ice initial condition by reading the job.out file (it should read /work/n02/n02/dflocco/startdump/be699i.restart.2015-01-01-00000.nc).
Could you help please, otherwise I'have to quickly start to run the 30 suites manually for the experiment.

comment:2 Changed 14 months ago by dcase

Dani, I can see that you have 30 start dumps, and your job files for the coupled runs include a variable to pick this up. Could you trigger one of the coupled jobs or point me to a job.err file which shows ASTART problem?


comment:3 Changed 14 months ago by xd904476

Hi Dave,
The suite stops at the RECON tasks. The job.err tells me that astart is unbound.

I can trigger a coupled task, but i don’t know what it would use as initial conditions.

Shall i trigger the recon again or a coupled task?

comment:4 Changed 14 months ago by dcase

Ok. The recon step only happens once, and then the ensembles will do their perturbations and make their own files. If you give a variable in suite.rc ([[recon]][[[environment]]]) you can put:


as was the case for perturb<ensemble> step.

Last edited 14 months ago by dcase (previous) (diff)

comment:5 Changed 14 months ago by xd904476

Ah ok. I got it. Then the recon needs to be run before all the perturbations.It was confusing with all the perturbation starting at the same time.
i'll try again by holding the perturbations and the coupled tasks.


comment:6 Changed 14 months ago by xd904476

Hi Dave, no luck.
The only thing I changed compared to last week when the suite run properly is the number of ensembles.
Shall I try deleting everything on archer rather than only restart it as new? Perhaps something is stuck?

In any case, I have now added astart also in the recon environment and I'll restart it.

comment:7 Changed 13 months ago by xd904476

  • Resolution set to completed
  • Status changed from new to closed

The ensemble size needs to be smaller than 16. Probably more changes to suite.rc are needed to overcome this issue and have the ensemble handling the queues by itself.


Note: See TracTickets for help on using tickets.