Opened 4 years ago
Closed 4 years ago
#1701 closed help (fixed)
Job running but no output appears
Reported by: | charlie | Owned by: | willie |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ARCHER | |
UM Version: | 6.6.3 |
Description
Hi,
Sorry to bother you, but I'm having trouble running some jobs on version 6.6.3.
I'm running 4 jobs at once. Regardless of job, however, the model is running for its full 6 hours but is only producing the first month e.g. /work/n02/n02/cjrw09/result/xlyhaa.pah1jan. And, although these have size, upon inspection with xconv they are all actually empty. It's also not producing any restart dumps, which is unusual.
My jobs are exact replicas of other jobs I have, all of which work fine. They have all been recompiled correctly, and they are based on a brand-new full extract. They are basically set up from scratch. I have followed exactly the same method to get them running as with my old jobs.
The only difference between my new jobs and the old ones is the input ancillary files, namely SST, sea ice and soil moisture - not how they are read, but the files themselves. So, unless I'm missing something silly, the obvious assumption is that it's one of these which is causing the problem.
The only obvious error in the .leave files is the following, but I don't know what this means or even if it's relevant:
=>> PBS: job killed: walltime 21626 exceeded limit 21600 aprun: Apid 18381595: Caught signal Terminated, sending to application Application 18381595 is crashing. ATP analysis proceeding... /var/spool/PBS/mom_priv/jobs/3224075.sdb.SC[305]: .: line 277: 30224: Terminated
Please can you advise?
Many thanks,
Charlie
Change History (4)
comment:1 Changed 4 years ago by willie
- Owner changed from um_support to willie
- Status changed from new to accepted
comment:2 Changed 4 years ago by charlie
Dear Willie,
Thanks a lot, yes I feared it was to do with that ancillary file.
However, I don't think it's because of the spatial coverage. That's exactly what I want it to look like - values over northern India and no values elsewhere - and this has worked in the past. If you compare
/work/n02/n02/cjrw09/ancil/hydro.d/smow_jules_1971-2004
with
/work/n02/n02/cjrw09/ancil/hydro.d/smow_jules_repeatingcyc_first5
you will see that spatially they are identical. If I run with the former, it works absolutely fine.
The only difference between them is temporally. The first ancillary file, which works, is a normal timeseries of values - 12240 days (= 34 years), where each day is different. The second ancillary file however, which doesn't work, is again a timeseries of the same number of total days, however every 360 days (i.e. a year) is repeated. So, for example, 10 January is always the same.
It this likely to be causing the problem? If so, why - the data are all there, so how does the model know it is repeating?
One thing that occurred to me: when creating the ancillary file (using xancil), I said No to "Is ancillary data periodic in time?" Should I have said Yes? If so, how can this make a difference?
Charlie
comment:3 Changed 4 years ago by willie
Hi Charlie,
It's not getting as far as time step 2 according to the pe0 file. The problem is that the ancillary file soil moisture contains NaNs?. This is evidenced in the xlyha.out file, but you can also see it for yourself by comparing the file with itself using cumf. The summary file for this should show no differences for normal files, but flag up the NaNs? in files that contain them. Also xconv > view data struggles to display, another indicator. I think the simplest thing to do is re-create this file.
Regards
Willie
comment:4 Changed 4 years ago by willie
- Resolution set to fixed
- Status changed from accepted to closed
Hi Charlie,
There is not much UM output for xlyha, but the file xlyha.out complains a lot about the soil moisture. Looking at it in xconv, it is all in one tiny clump, rather than a smoothly changing global distribution. So I think you need to regenerate this ancillary and try again.
Regards,
Willie