Opened 3 months ago

Closed 3 months ago

#2363 closed error (fixed)

Disk quota exceed - but where?

Reported by: s1374103 Owned by: um_support
Priority: highest Component: UM Model
Keywords: Cc:
Platform: Monsoon2 UM Version: 8.4

Description

Hi Helpdesk,

I have some simulations running and they have all failed.

Jobs - xnvqc, xnvqd and xnvqf

Each simulation failed today/yesterday due to disk quota exceedance.

e.g.

sys-122 : UNRECOVERABLE error on system request 
  Disk quota exceeded

Encountered during an I/O operation on unit 6
Fortran unit 6 is connected to a sequential formatted text file:
  "/projects/ukca-ed/kjamie/xnvqc/pe_output/xnvqc.fort6.pe132"

sys-122 : UNRECOVERABLE error on system request 
  Disk quota exceeded

Encountered during an I/O operation on unit 6
Fortran unit 6 is 
sys-122 : UNRECOVERABLE error on system request 
  Disk quota exceeded

Encountered during an I/O operation on unit 6
Fortran unit 6 is connected to a sequential formatted text file:
  "/projects/ukca-ed/kjamie/xnvqc/pe_output/xnvqc.fort6.pe224"
Application 13674080 is crashing. ATP analysis proceeding...
connected to a sequential formatted text file:
  "/projects/ukca-ed/kjamie/xnvqc/pe_output/xnvqc.fort6.pe120"

sys-122 : UNRECOVERABLE error on system request 
  Disk quota exceeded

or…

lib-4029 : UNRECOVERABLE library error 
  An underlying C library read or write request failed.

Encountered during a list-directed WRITE to unit 6
Fortran unit 6 is connected to a sequential formatted text file:
  "/projects/ukca-ed/kjamie/xnvqf/pe_output/xnvqf.fort6.pe183"
Application 13679579 is crashing. ATP analysis proceeding...
basename: missing operand
Try `basename --help' for more information.
basename: missing operand
Try `basename --help' for more information.

ATP Stack walkback for Rank 183 starting:
  _start@start.S:113
  __libc_start_main@libc-start.c:242
  flumemain_@flumeMain.f90:48
  um_shell_@um_shell.f90:1865
  u_model_@u_model.f90:2688
  atm_step_@atm_step.f90:10120
  atmos_physics2_@atmos_physics2.f90:3538
  ni_bl_ctl_@ni_bl_ctl.f90:2088
  bl_intct_@bl_intct.f90:1099
  bdy_layr_@bdy_layr.f90:1345
  sf_expl_l_@sf_expl_jls.f90:914
  physiol_@physiol_jls.f90:503
  sf_stom_@sf_stom_jls.f90:1003
  bvoc_emissions_@bvoc_emissions.f90:227
  _FWF@0x1d580e5
  _sw_endrec@0x1d56cf1
  _ferr@0x1d4499b
  abort@abort.c:92
  raise@pt-raise.c:42

Where exactly is the disk quota being exceeded? I checked PUMA and can see that it's not there. But for monsoon, I am unsure if the problem is comig from /home, /projects/ukc-aed/kjamie, or somewhere else altogether?

Regards,

Jamie

Change History (2)

comment:1 Changed 3 months ago by willie

Hi Jamie,

The problem is running out of disk space on /projects/ukca-ed/kjamie. You've used more than 1.9TB on this and the job xnvqc is taking 286GB. It is failing when writing to the pe_output, which in itself is taking about 200GB.

So the solution is to try to remove old runs that you don't need, or failing that to increase the quota. If the latter, it would be useful to make an estimate of the amount of space you need to complete your work.

Regards
Willie

comment:2 Changed 3 months ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.