Opened 4 years ago
Closed 4 years ago
#2062 closed help (fixed)
UM 6.6.3 jobs hanging
Reported by: | jscreen | Owned by: | willie |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ARCHER | |
UM Version: | 6.6.3 |
Description
Hi
I'm trying to run HadGEM2-ES (6.6.3). I have a number of so-called test runs that derive from different existing jobs:
xnbgc is modified from a Met Office CMIP5 RCP8.5 simulation
xnbgd is modified from a Met Office CMIP5 historical simulation
xnbge and xnbgf are modified from Grenville's HadGEM2-ES run (xgaja)
These runs have different ancillaries and STASH (amongst other things) but they are all suffering a common problem. The jobs submit ok, the reconfiguration proceeds fine. The jobs appear to run but output no data (beyond the initial creation of the first set of output files). I've played around with the job length and dumping frequency and from what I can tell the jobs aren't even completing 1 day (despite running for up to 5 hours). Eventually the jobs crash due to hitting the walltime limit. The .leave files don't contain anything obvious to point to the problem, but the fact that the same thing is happening for all four jobs must mean something (I'm just not sure what!)
James
Change History (8)
comment:1 Changed 4 years ago by willie
comment:2 Changed 4 years ago by grenville
James
xgaja is a Hector job (that shouldn't make much difference) - xgada is the standard HadGEM2-ES job. It also ran on hector but setting the machine to login.archer.ac.uk definitely works.
Willie appears to have found the difference since I started writing.
Grenvile
comment:3 Changed 4 years ago by jscreen
Willie
Argh yes, that error arose when I fiddled with something or the other (can't remember quite what) when trying to solve the hanging issue. I don't think that is the cause of the common hanging problem. I haven't seen that message for either xnbge or xnbgf which are also hanging.
Please could you look at xnbge and xnbgf to diagnose the problem.
Thanks, James
comment:4 Changed 4 years ago by jscreen
If it helps the job xnbgd is "running" now and appears to be hanging as we speak
comment:5 Changed 4 years ago by willie
Hi James,
This is a problem with the processor configuration. Your job xnbge has 16x12 for both the model and reconfiguration. If you revert to 12 EW X 8 NS for the model and 8x8 for the reconfiguration it should work. You also need to "override year in dump with year in model" for both the atmosphere and ocean start dumps.
Regards
Willie
comment:6 Changed 4 years ago by willie
- Owner changed from um_support to willie
- Status changed from new to accepted
comment:7 Changed 4 years ago by jscreen
Thanks Willie, they are running fine now.
I'm sure I've run with that processor configuration before (but maybe only for a atmosphere-only job). At least it was a simple problem to fix!
James
comment:8 Changed 4 years ago by willie
- Resolution set to fixed
- Status changed from accepted to closed
Hi James,
You're getting
So it looks like you've extended a name list array somewhere and possibly added the values in the wrong place?
Regards
Willie