Opened 9 years ago
Closed 9 years ago
#878 closed help (fixed)
xconv crashing during conversion
Reported by: | a.elvidge | Owned by: | jeff |
---|---|---|---|
Component: | Disk Space | Keywords: | |
Cc: | g.m.s.lister@… | Platform: | |
UM Version: | <select version> |
Description
Hi,
During conversion of pp files to a netcdf file xconv crashes. Because of the large size of my .nc file I am using a beta version (I believe) of xconv here (/home/n02/n02/jwc/um/xconv/xconv1.92) to enable 64 bit offset .nc output format.
I have used this plenty before for exactly the same kind of jobs without problem. I have tried deleting the file and retrying and have notices that it appears to crash when the new .nc file reaches a certain size (about 9gb). Could this be simply a disk quota issue? On SAFE, it appears that my personal usage is fine but that the n02 group usage may be at or over its limit?
Thanks, Andy
Change History (11)
comment:1 Changed 9 years ago by grenville
comment:2 Changed 9 years ago by a.elvidge
Hi Grenville,
I'm trying to convert all the xfxkla_pa* files to 060106_1.5km.nc in /work/n02/n02/aelvidge/xfxkq_060106
I've just tried once more with the same result… the file reaches about 9gb then xconv (/home/n02/n02/jwc/um/xconv/xconv1.92) crashes (disappears).
Thanks, Andy
comment:3 Changed 9 years ago by grenville
- Owner changed from grenville to jeff
- Status changed from new to assigned
comment:4 Changed 9 years ago by a.elvidge
Jeff,
I have used xconv1.92 on exactly the same stash files (same job) but for different runs. It's suddenly stopped working. It crashes and a 'killed' message appears on the hector command line. It is not just failing on this one particular job; I just unsuccessfully tried it on a set of UM output files I had previously (about 2 weeks ago) successfully converted to 64-bit offset.
Thanks, Andy
comment:5 Changed 9 years ago by jeff
Hi Andy
You shouldn't be using /home/n02/n02/jwc/um/xconv/xconv1.92, but it hasn't changed for almost 2 years so I don't know why it stopped working. Try using /home/n02/n02/jwc/bin/xconv1.92 instead and see if that works.
Jeff.
comment:6 Changed 9 years ago by a.elvidge
Hi Jeff,
You advised I try /home/n02/n02/jwc/um/xconv/xconv1.92 in response to a previous query I raised on here, and I have been using it ever since. Unfortunately /home/n02/n02/jwc/bin/xconv1.92 is not working for me either - exactly the same problem. I have also tried a colleague's computer (who is on Linux, I am on Windows using XMing), and the problem still occurs.
Perhaps an idea might be if you could try using it for a similar job, or perhaps copying across my files to try?… What ever you think would the best way to investigate the problem further.
Thanks, Andy
comment:7 Changed 9 years ago by a.elvidge
I copied xconv1.92 across to the uea cluster along with my UM output files, and the conversion works. So it appears to me likely to be either a problem with hector or with my account in particular. Its as if the program times out and is forced to close after it has been running for a certain time (but only when running a job… when left idle it stays open).
Andy
comment:8 Changed 9 years ago by grenville
Andy
Try running xconv on the larger memory server (ssh -Y lms from a HECToR login node) (http://www.hector.ac.uk/support/documentation/guides/lms/ for details). You should be able run interactively for a while on that machine.
You will need to apply for an account on the lms through SAFE, but this should be quick.
Grenville
comment:9 Changed 9 years ago by a.elvidge
Hi Grenville,
Thanks, this works. I tried again without logging into lms and it is still not working. So it would appear that, at least in my case, running xconv in lms is now a requirement for large conversions.
Thanks, Andy
comment:10 Changed 9 years ago by jeff
- Cc g.m.s.lister@… added
Hi Andy
Its not a xconv problem but something they must have changed on hector. You could try asking the hector helpdesk, but I suspect they will say you shouldn't be doing this sort of processing on a hector login node. This is what the lms is for so using it is definitely the best idea.
Jeff.
comment:11 Changed 9 years ago by grenville
- Resolution set to fixed
- Status changed from assigned to closed
Andy
It's not a disc quota problem. Please let us know which file(s) are causing the problem.
Grenville