Opened 10 years ago

Closed 9 years ago

#385 closed help (fixed)

xconv job for pp to nc processing running out of time in serial queue

Reported by: seruth Owned by: lois
Component: UM Tools Keywords: convsh
Cc: Platform:
UM Version: 4.5

Description

I'm not able to get a fairly simple script to run on Hector. This same
sort of script ran fine on hpcx. I'm processing UM output files to netcdf
format. I can process 1 year of the files in 15 minutes in the serial
queue but when I try processing all 6 years even a 6 hour job in the
serial queue doesn't succeed. We tried changing the memory allocation too
but that hasn't worked.

Change History (3)

comment:1 Changed 10 years ago by lois

  • Owner changed from um_support to lois
  • Status changed from new to assigned

Sorry for the delay in replying Ruth, there was a bit of confusion as to who would reply. This is not a new problem on HECToR, it is really a feature of the system!

If the files that you are converting are on /work, that is on the Lustre file system then it can be excruciating slow. The design of the Lustre file system is perhaps not HECToR's finest feature as it has only 1 metadata server which is a bottle neck. So converting files on /work is not a good idea.

Other solutions would be to copy the data to /home on HECToR and convert it there however we don't have a large /home allocation. Or you could ftp your raw data back to your own local workstation and convert it there.

There is an on-going discussion that NERC should have a workstation, with lots of disk, local to HECToR to resolve this issue. Everyone agrees and there is money but progress is slow.

Lois

comment:2 Changed 9 years ago by lois

A post processing service should be available on HECToR by March 2011 - the negotiations have taken a very long time!

comment:3 Changed 9 years ago by lois

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.