Opened 5 months ago

Last modified 5 months ago

#3159 new help

check suite u-bq532p on puma/ARCHER?

Reported by: pmcguire Owned by: um_support
Component: UM Model Keywords: ARCHER, cycling, queueing
Cc: p.l.vidale@… Platform: ARCHER
UM Version: 11.5

Description

Hi CMS Helpdesk
The 20.5 year acclimation ARCHER run (suite u-bq532p on puma) with 1,152 processors has made it through the 1st 6-month cycle for atmos-main that I had requested with 12 hours of ARCHER time.
it took 48 hours or so after the submission into the queue in order to start running, after which it took about 7 hours.

I didn't expect the 48 hours of waiting for each cycle. So I have held the run, and restarted it for the next cycle with 18-month cycles and 24 hours of ARCHER time per cycle, with the
intention of trying to increase the amount of calendar running time elapsed for each cycle and for each long queueing time .
I think I did an appropriate thing (please advise if I didn't), and hopefully the waiting time in the queue won't be too much longer for the next cycles.

When I restarted the next cycle, I also added daily output to STASH for the time-averaged temperature variable; I had intended to add that before, and I had only had it in the monthly dumps before.

After I did this, my PI suggested that I request that somebody look at the suite, to see if I did this cycling change correctly. Can you do that?
When I did the cycling change, I waited until the first cycle of atmos-main was complete, then I hed the suite, then I made the changes to the suite, then I did a rose suite-run --reload, after which I released the suite and retriggered the next cycle's atmos-main task.
The changes I made include the extra STASH output entry mentioned above, and changing the cycling time from 6M to 18M, and changing the Wallclock from 12 hours to 24 hours.

It's still waiting in the queue. It might take more than 48 hours to start running.

I should note that when I did this, the suite tried to start a future cycle that is not needed (19890901) and a future cycle that is needed but might be started too early (19900301). So maybe I did do something wrong. I told the suite to kill those future cycles and I put them on hold. I can retrigger the latter cycle (19900301) as needed later.

Is there any way of reducing the queueing time? This 48 hour queueing time for 1,152 processors was for a submission time around 3pm on a Friday afternoon, which meant it was queueing over the weekend,
when I might expect it to be less busy.

Patrick

Change History (4)

comment:1 Changed 5 months ago by grenville

Patrick

The machine is unlikely to be less busy before it ends - see Queue Status for the past month at https://www.archer.ac.uk/status/, showing that it has go progressively busier.

The suite "looks" OK. but that doesn't mean much. I can't follow the "the suite tried to start a future cycle that is not needed (19890901)…" paragraph.

Grenville

comment:2 Changed 5 months ago by pmcguire

Thanks, Grenville
The 2nd cycle just started. But it failed because it couldn't find the dump file. Maybe the dump file was deleted when I did the reload? Or maybe it was deleted when it did the pptransfer of the 1st cycle?
Given the 40-48 hour queueing time, is it best that I restart the whole thing from scratch? Or is there some shortcut I could take (i.e. by copying dump files to the right place and doing a —restart)? Is there anything I need to change before resuming or restarting the run?
Patrick

comment:3 Changed 5 months ago by pmcguire

Hi Grenville
I had decided to copy the 19890301 dump for the 1st cycle from the /nerc archive directory back to the ~/cylc-run directory and then retrigger the task, but then I looked at the log files for the 2nd cycle, and saw in the atmos_main stdout log file that there was a WARNING - REQUESTED AND ACTUAL THREADING LEVEL DIFFERENT.

So then I changed my decision, and decided to start from scratch, and I copied ~/roses/u-bq532p to ~/roses/u-bq532q, and started a new run. I might lose a day of time or so, but maybe it will run past the 1st cycle properly this way and maybe the dump file from the end of the first cycle won't be deleted.

I still have the setting for delete superceded dump files set. Hopefully this works properly and doesn't cause a problem.
Patrick

comment:4 Changed 5 months ago by grenville

Patrick

OK - there's a little less urgency now (I doubt the machine will be less congested.) Our advice to others should be not to change the cycling frequency mid-stream given your experience.

WARNING - REQUESTED AND ACTUAL THREADING LEVEL DIFFERENT - is not a problem.

Grenville

Note: See TracTickets for help on using tickets.