Opened 8 years ago

Closed 8 years ago

#941 closed error (fixed)

Global model fails after 1 day

Reported by: cbirch Owned by: willie
Component: UM Model Keywords: NaN, time step
Cc: Platform: HECToR
UM Version: 7.3

Description

Hi,

I'm setting up 3 new nested grids (global-12km-4km) over Oman, India and Mozambique and I am running different case studiies (i.e. different dates) over each of them. The global vn7.3 used to make the LBC's for the 12km nest runs for the the required 48 hours for the Oman and India case studies but for the Mozambique case study the job falls over after about 24 hours. The job ID is xhrnc, the start dump is /work/n02/n02/cbirch/start_files/wave_clouds/20100926_qwqg00.T+0 and the .leave file is xhrnc000.xhrnc.d12286.t113329.leave.

I have tried halving the timestep - this got it to run for more like 30 than 24 hours but then failed.
I also tried reconfiguring the 24 hour restart dump and running the second 24 hours separately. This didn't work - the model fell over about 6 hours into the second day.

I'm not sure what else to try? Other than trying an analysis at a different time?

Thanks,
Cathryn

Change History (13)

comment:1 Changed 8 years ago by willie

  • Keywords NaN, time step added

Hi Cathryn,

There are NaN's in the vertical wind at time step 561. I think you're doing the right thing by halving the time step. Just halve it again - so quarter your original value - and try again.

Regards,

Willie

comment:2 Changed 8 years ago by cbirch

Hi Willie,

I tried halving the timestep again and it didn't work - it failed at about the same point (24 hours into the run). Any other suggestions?

Cathryn

comment:3 Changed 8 years ago by willie

Hi Cathryn,

Try reducing the optimization level - Compilation and Modifications > Compile options for the UM model and then select "safe" instead of "high".

Regards,

Willie

comment:4 Changed 8 years ago by cbirch

Hi Willie,

That didn't work either, it failed on exactly the same timestep (870) as when 'high' was used (xhrnc000.xhrnc.d12299.t114342.leave).

Cathryn

comment:5 Changed 8 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Cathryn,

I'm still investigating. At the moment we have:

  • Your original job failed at various time steps (NaNs?)
  • a 7.3 job with start dump advanced by 12hours failed even quicker (NaNs?)
  • a 7.5 job with original start dump was killed (segmentation violation)
  • a 7.6 job might have worked but ran out of time at the end

My idea was that the start dump is at 7.5 but you were processing it with an older model vn7.3. Is it worth getting it to work at a later version?

Regards,

Willie

comment:6 Changed 8 years ago by cbirch

If that makes sense - to me it doesn't make that much difference. I just need the LBC's and probably the astart file from the global model to run the vn7.3 12km LAM nest. I'm not interested in any diagnostics from the global model so I don't really mind if the global model is a different version as long as it works.

Cathryn

comment:7 Changed 8 years ago by willie

Hi Cathryn,

My 7.6 job xhtfc (UMUI user willie) runs for just over 44 hours - is that enough? The reconfigured start dump is in xhtfb. The files are on /work/n02/n02/wmcginty. I don't know why this particular start dump is so problematic.

Regards,

Willie

comment:8 Changed 8 years ago by cbirch

Hi Willie,

Thanks for doing this.

44 hours should be enough for now. I have copied over the LBC and astart file. I will try and run the 12km and 4km nests.

Thanks,
Cathryn

comment:9 Changed 8 years ago by cbirch

Hi Willie,

I just tried to run the 12km next but the LBC's in xhtfc (xhtfc.alabcou2) are not correct for my 12km nest. My nest is over Mozambique and I think the LBCs that were created were for a NAE LAM. Should I copy over the job and run it again to get the correct LBC's?

Cathryn

comment:10 Changed 8 years ago by cbirch

Hi Willie,

I've just tried to find your xhtf experiment on the umui (to copy the global 7.6 job so I can create LBC's for the 12km nest) but it doesn't seem to exist?

Cathryn

comment:11 Changed 8 years ago by willie

  • Platform set to <select platform>

Sorry Cathryn, I deleted it. It is now called xhwpc. Please take a copy. It is the standard umui PS25 job xhxbc with your start dump.

regards,

Willie

comment:12 Changed 8 years ago by cbirch

Hi Willie,

I copied it across and I managed to get it to run for the full 48 hours. I have also successfully run the 12 and 4km nests so you can close this now.

Thanks for your help,
Cathryn

comment:13 Changed 8 years ago by willie

  • Platform changed from <select platform> to HECToR
  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.