Opened 6 years ago
Closed 6 years ago
#1446 closed help (fixed)
Run failing, no error message
Reported by: | webber24 | Owned by: | annette |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ARCHER | |
UM Version: | 8.4 |
Description
Dear CMS,
The job xkyor, which I submitted last night and this morning has failed to complete twice, but when I look in pe_output on archer:/work/n02/n02/webber24/xkyor and in my .leave files there is no mention of the word error or fail. Do you have any ideas as to why this could be failing?
Best,
Chris
Change History (9)
comment:1 Changed 6 years ago by annette
comment:2 Changed 6 years ago by annette
- Owner changed from um_support to annette
- Status changed from new to assigned
comment:3 Changed 6 years ago by webber24
Hi Annette,
Thanks for your help, but I've looked through all of the changes I've made since the last run that successfully ran and I can't see an error. I have checked the input files that I changed and there are no NaNs?. Furthermore I am now trying to output Theta on PV2 field, which was giving some domain errors, which have now gone (these showed up on the pe_output 48 file).
Any Further Ideas?
Chris
comment:4 Changed 6 years ago by webber24
Just came across what I believe could be causing the error, my ancillary files in
user single_level ancillary file & fields
all show large negative values in the .astart file in my work/n02/n02/webber24 directory.
This is contrary to the values you see if you xconv the ancillary files from source, which give respectable values. What could be causing this overwrite?
Many Thanks,
Chris
comment:5 Changed 6 years ago by annette
Hi Chris,
I'm looking at this now - I'll get back to you shortly.
Annette
comment:6 Changed 6 years ago by webber24
Hi Annette,
I've found the issue, I have jobs now happily running when I remove one of the stash fields from he UPA usage profile. The culprit is Theta on PV2 and it seems to be an error visible in the start dumps of job xkyoo (the same job run without Theta-PV2 field). The output for this field is erratic with one huge outlier. I believe it was this that was causing the instabilities and my next question was going to be, if you knew a way of stably outputting this field.
Chris
comment:7 Changed 6 years ago by annette
Chris,
The only thing I can think of is to switching packing off for this stream.
As to your user single-level ancillaries, I noticed a warning in the reconfiguration that your ancillaries use a 360-day calendar rather than a 365-day one, but I don't know if that would have caused the problem.
Annette
comment:8 Changed 6 years ago by webber24
Thanks Annette,
I have a feeling why theta-PV2 is not working and it may be something to do with an alteration I had to make to fix a bug for nudging in vn8.4. The next release job is imminent with the bug fixed I have been assured, so I guess I will have to wait until it is. I am not sure what you mean by switching packing off for this stream though? Do you mean to exclude this field from being output?
Many Thanks,
Chris
comment:9 Changed 6 years ago by annette
- Resolution set to fixed
- Status changed from assigned to closed
Chris,
I think I misunderstood what you were asking - packing is to do with writing the fields out, rather than calculating them.
Since you now seem to have got over your crash, I will close the ticket. But do get in touch if you have further questions.
Annette
Hi Chris,
The leave file indicates that it was pe 48 that crashed.
So having a look at file 48 in pe_output shows an error message:
This can be caused by NaNs? in the data due to numerical instabilities or errors with input data.
It is worth checking that your start dump looks OK, and doesn't contain NaNs? (you can do this by cumf-ing the file with itself).
And look through the warnings and diagnostic messages to see if anything is going awry.
Annette