Opened 6 years ago
Closed 6 years ago
#1419 closed help (fixed)
Model crash relating to .xhist / .thist files
Reported by: | James | Owned by: | annette |
---|---|---|---|
Component: | UM Model | Keywords: | unit 12, UKCA, file open |
Cc: | Platform: | ARCHER | |
UM Version: | 7.3 |
Description
I'm running UM-UKCA vn7.3 (JobID: xkpib) and the model's falling over with the following error message written to the .leave file.
—-
sys-2 : UNRECOVERABLE error on system request
No such file or directory
Encountered during an OPEN of unit 12
Fortran unit 12 is not connected
_pmiu_daemon(SIGCHLD): [NID 00484] [c2-0c1s9n0] [Thu Dec 11 03:35:10 2014] PE RANK 0 exit signal Aborted
[NID 00484] 2014-12-11 03:35:10 Apid 12200682: initiated application termination
diff: /work/n02/n02/jgl22/tmp/tmp.mom4.9816/xkpib.xhist: No such file or directory
qsexecute: Copying /work/n02/n02/jgl22/um/xkpib/xkpib.thist to backup thist file /work/n02/n02/jgl22/um/xkpib/xkpib.thist_keep
xkpib: Run failed
—-
It seems the model's compiling ok but falling over close to the start of the run, and I'm at a bit of loss as to what's wrong.
Luke Abraham kindly took a look at the output with me and suggested I raise a ticket. I'd really appreciate your help.
Thanks,
James
Change History (9)
comment:1 Changed 6 years ago by annette
comment:2 Changed 6 years ago by annette
Comment from James:
I've changed the permisions, and the path/name of the .leave file are as follows:
/home/n02/n02/jgl22/output/xkpib000.xkpib.d14344.t124349.leave
Many thanks - really appreciate your help,
James
comment:3 Changed 6 years ago by James
Hi,
I understand that Annette passed this on to one of her colleagues ahead of leave around mid December, and was just wondering if there'd been any progress?
Many thanks,
James
comment:4 Changed 6 years ago by annette
- Owner changed from um_support to annette
- Status changed from new to assigned
Hi James,
Go to Input/ Output Control → Script Inserts and Modifications and add the following environment variable to the table: ATP_ENABLED with value: 1
Then re-run and this should provide a stack trace to help hunt down where the model crashed.
Annette
comment:5 Changed 6 years ago by James
Thanks Annette,
I've added the environment variable and resubmitted the job:
The job directory on host login.archer.ac.uk is:
/home/n02/n02/jgl22/umui_runs/xkpib-013163017
The compilation output will be sent to file:
/home/n02/n02/jgl22/output/xkpib000.xkpib.d15013.t163027.comp.leave
The model output will be sent to file:
/home/n02/n02/jgl22/output/xkpib000.xkpib.d15013.t163027.leave
Really appreciate your help Annette,
James
comment:6 Changed 6 years ago by annette
Hi James,
Just for future reference, you don't need to recompile the UM to get the stack trace.
Take a look at the output yourself to see what is happening. Near the top of the leave file is the following error message:
sys-2 : UNRECOVERABLE error on system request No such file or directory
And below this is the call path of the routines that were executing when the model failed (and so produced this error):
ATP Stack walkback for Rank 0 starting: _start@start.S:113 __libc_start_main@libc-start.c:242 flumemain_@flumeMain.f90:38 um_shell_@um_shell.f90:3817 u_model_@u_model.f90:5505 ukca_main1_@ukca_main1-ukca_main1.f90:7279 ukca_read_aerosol_@ukca_read_aerosol.f90:469 _OPEN@0x100c30d __OPN@0x100c0bc _f_open@0x1009f34 _ferr@0x1005bfa abort@abort.c:92 raise@pt-raise.c:42 ATP Stack walkback for Rank 0 done
The file ukca_read_aerosol.f90 can be found in the compilation directory for the job:
~jgl22/um/xkpib/ummodel/ppsrc/UM/atmosphere/UKCA
By looking at the code it can be deduced that it is trying to open either Sulfate_SAD_SPARC_1950-2100.asc or Sulfate_SAD_SPARC_Background.asc from directory:
/work/n02/n02/luke/DATA/QESM/
This directory, however, doesn't exist on Archer. I have emailed Luke about this…
Annette
comment:7 Changed 6 years ago by annette
Hi James,
Luke has replied that the files are in:
/work/n02/n02/ukca/ANCILS/QESM/
So you just need to edit the appropriate line in your branch:
https://puma.nerc.ac.uk/trac/UM/browser/UM/branches/dev/james/vn7.3_CheT2_Base/src/atmosphere/UKCA/ukca_read_aerosol.F90?rev=17394
Hopefully this makes sense.
Annette
comment:8 Changed 6 years ago by James
That's absolutely fantastic Annette, and Luke - many thanks to you both!
I've changed the relevant line in the code and will try rerunning.
Really appreciate your help,
James
comment:9 Changed 6 years ago by annette
- Keywords 12, UKCA, file open added; xhist thist 12 removed
- Resolution set to fixed
- Status changed from assigned to closed
Hi James,
Can you change the permissions on your directories please?
Then let us know the full path and file name of your .leave file.
Thanks,
Annette