Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#2181 closed help (fixed)

simulation stops 'successfully' before the end date

Reported by: ggxmy Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version: 10.6

Description

Dear CMS,

As I wrote at the end of #2178, my vn10.6.1 job u-al564 stopped before the end date. I can see no indication of failure or error. Attached image includes rose gui and cylc panels.


So the end date should be the end of 2002 but simulation stops 'happily' at April 1999. Doing rose suite-run --restart does not change the situation. Could you please help me restart the simulation?

There is another interesting thing that might help you figure out the situation. In /home/d03/myosh/cylc-run/u-al564/log/job/ there are folders like below. These show the simulation has once gone through 20011101. I'm not sure why the job tried to simulate 1999 again. Maybe I have messed up something? I wonder if this is related to the stopping of the simulation. What do you think?

drwxr-xr-x. 4 myosh mo_users 4096 May 24 16:36 19990101T0000Z
drwxr-xr-x. 3 myosh mo_users 4096 May 24 16:19 19990201T0000Z
drwxr-xr-x. 3 myosh mo_users 4096 May 24 16:21 19990301T0000Z
drwxr-xr-x. 3 myosh mo_users 4096 May 24 16:36 19990401T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 02:30 20001101T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 05:14 20001201T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 07:55 20010101T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 10:32 20010201T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 13:17 20010301T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 15:53 20010401T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 18:37 20010501T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  8 21:20 20010601T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  9 00:02 20010701T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  9 02:56 20010801T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  9 05:34 20010901T0000Z
drwxr-xr-x. 5 myosh mo_users 4096 May  9 08:14 20011001T0000Z
drwxr-xr-x. 3 myosh mo_users 4096 May  9 08:12 20011101T0000Z

Thanks,
Masaru

Attachments (1)

stopped suite.jpg (119.5 KB) - added by ggxmy 3 years ago.

Download all attachments as: .zip

Change History (4)

Changed 3 years ago by ggxmy

comment:1 Changed 3 years ago by ros

Hi Masaru,

Looking at the image in #2178 the run is currently held from progressing any further because of the waiting housekeeping task from 19920201. There is a limit to the maximum number of cycles in a suite that can be active at any one time.

You will need to try right-clicking on the waiting housekeeping task and change its status to ready or try trigger (run now).

Once the housekeeping task has completed the next atmos task should start up along with the remaining housekeeping tasks.

As for why there are directories for 2001… these are from May 9th and I would thus surmise that when you resubmitted it first time after the reboot you didn't do a restart rather just a rose suite-run which would cause it to go back to the beginning.

Regards,
Ros.

comment:2 Changed 3 years ago by ggxmy

  • Resolution set to fixed
  • Status changed from new to closed

Thank you Ros. for your help.

Here is what happened for future reference.

I tried that and 'housekeeping' ran. And then the status of 'housekeeping' of next month becomes 'waiting'. If I change its status to 'ready', 'housekeeping' of the next month becomes 'waiting'. In the meanwhile simulation started to run. But a month of simulation completes in less than a minute. Then I had to change the status of 'housekeeping' to 'ready' manually again… This continued until the furthest point the simulation previously ran (2001.11). Then the run started to behave normally.

Masaru

comment:3 Changed 3 years ago by ggxmy

Actually I had to change the status of 'housekeeping' to 'ready' manually until the simulation ended…

Note: See TracTickets for help on using tickets.