Opened 8 years ago

Closed 8 years ago

#878 closed help (fixed)

xconv crashing during conversion

Reported by: a.elvidge Owned by: jeff
Component: Disk Space Keywords:
Cc: g.m.s.lister@… Platform:
UM Version: <select version>

Description

Hi,

During conversion of pp files to a netcdf file xconv crashes. Because of the large size of my .nc file I am using a beta version (I believe) of xconv here (/home/n02/n02/jwc/um/xconv/xconv1.92) to enable 64 bit offset .nc output format.
I have used this plenty before for exactly the same kind of jobs without problem. I have tried deleting the file and retrying and have notices that it appears to crash when the new .nc file reaches a certain size (about 9gb). Could this be simply a disk quota issue? On SAFE, it appears that my personal usage is fine but that the n02 group usage may be at or over its limit?

Thanks, Andy

Change History (11)

comment:1 Changed 8 years ago by grenville

Andy

It's not a disc quota problem. Please let us know which file(s) are causing the problem.

Grenville

comment:2 Changed 8 years ago by a.elvidge

Hi Grenville,

I'm trying to convert all the xfxkla_pa* files to 060106_1.5km.nc in /work/n02/n02/aelvidge/xfxkq_060106

I've just tried once more with the same result… the file reaches about 9gb then xconv (/home/n02/n02/jwc/um/xconv/xconv1.92) crashes (disappears).

Thanks, Andy

comment:3 Changed 8 years ago by grenville

  • Owner changed from grenville to jeff
  • Status changed from new to assigned

comment:4 Changed 8 years ago by a.elvidge

Jeff,

I have used xconv1.92 on exactly the same stash files (same job) but for different runs. It's suddenly stopped working. It crashes and a 'killed' message appears on the hector command line. It is not just failing on this one particular job; I just unsuccessfully tried it on a set of UM output files I had previously (about 2 weeks ago) successfully converted to 64-bit offset.

Thanks, Andy

comment:5 Changed 8 years ago by jeff

Hi Andy

You shouldn't be using /home/n02/n02/jwc/um/xconv/xconv1.92, but it hasn't changed for almost 2 years so I don't know why it stopped working. Try using /home/n02/n02/jwc/bin/xconv1.92 instead and see if that works.

Jeff.

comment:6 Changed 8 years ago by a.elvidge

Hi Jeff,

You advised I try /home/n02/n02/jwc/um/xconv/xconv1.92 in response to a previous query I raised on here, and I have been using it ever since. Unfortunately /home/n02/n02/jwc/bin/xconv1.92 is not working for me either - exactly the same problem. I have also tried a colleague's computer (who is on Linux, I am on Windows using XMing), and the problem still occurs.

Perhaps an idea might be if you could try using it for a similar job, or perhaps copying across my files to try?… What ever you think would the best way to investigate the problem further.

Thanks, Andy

comment:7 Changed 8 years ago by a.elvidge

I copied xconv1.92 across to the uea cluster along with my UM output files, and the conversion works. So it appears to me likely to be either a problem with hector or with my account in particular. Its as if the program times out and is forced to close after it has been running for a certain time (but only when running a job… when left idle it stays open).

Andy

comment:8 Changed 8 years ago by grenville

Andy

Try running xconv on the larger memory server (ssh -Y lms from a HECToR login node) (http://www.hector.ac.uk/support/documentation/guides/lms/ for details). You should be able run interactively for a while on that machine.

You will need to apply for an account on the lms through SAFE, but this should be quick.

Grenville

comment:9 Changed 8 years ago by a.elvidge

Hi Grenville,

Thanks, this works. I tried again without logging into lms and it is still not working. So it would appear that, at least in my case, running xconv in lms is now a requirement for large conversions.

Thanks, Andy

comment:10 Changed 8 years ago by jeff

  • Cc g.m.s.lister@… added

Hi Andy

Its not a xconv problem but something they must have changed on hector. You could try asking the hector helpdesk, but I suspect they will say you shouldn't be doing this sort of processing on a hector login node. This is what the lms is for so using it is definitely the best idea.

Jeff.

comment:11 Changed 8 years ago by grenville

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.