Opened 3 years ago
Closed 3 years ago
#2363 closed error (fixed)
Disk quota exceed - but where?
Reported by: | s1374103 | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | Monsoon2 | |
UM Version: | 8.4 |
Description
Hi Helpdesk,
I have some simulations running and they have all failed.
Jobs - xnvqc, xnvqd and xnvqf
Each simulation failed today/yesterday due to disk quota exceedance.
e.g.
sys-122 : UNRECOVERABLE error on system request Disk quota exceeded Encountered during an I/O operation on unit 6 Fortran unit 6 is connected to a sequential formatted text file: "/projects/ukca-ed/kjamie/xnvqc/pe_output/xnvqc.fort6.pe132" sys-122 : UNRECOVERABLE error on system request Disk quota exceeded Encountered during an I/O operation on unit 6 Fortran unit 6 is sys-122 : UNRECOVERABLE error on system request Disk quota exceeded Encountered during an I/O operation on unit 6 Fortran unit 6 is connected to a sequential formatted text file: "/projects/ukca-ed/kjamie/xnvqc/pe_output/xnvqc.fort6.pe224" Application 13674080 is crashing. ATP analysis proceeding... connected to a sequential formatted text file: "/projects/ukca-ed/kjamie/xnvqc/pe_output/xnvqc.fort6.pe120" sys-122 : UNRECOVERABLE error on system request Disk quota exceeded
or…
lib-4029 : UNRECOVERABLE library error An underlying C library read or write request failed. Encountered during a list-directed WRITE to unit 6 Fortran unit 6 is connected to a sequential formatted text file: "/projects/ukca-ed/kjamie/xnvqf/pe_output/xnvqf.fort6.pe183" Application 13679579 is crashing. ATP analysis proceeding... basename: missing operand Try `basename --help' for more information. basename: missing operand Try `basename --help' for more information. ATP Stack walkback for Rank 183 starting: _start@start.S:113 __libc_start_main@libc-start.c:242 flumemain_@flumeMain.f90:48 um_shell_@um_shell.f90:1865 u_model_@u_model.f90:2688 atm_step_@atm_step.f90:10120 atmos_physics2_@atmos_physics2.f90:3538 ni_bl_ctl_@ni_bl_ctl.f90:2088 bl_intct_@bl_intct.f90:1099 bdy_layr_@bdy_layr.f90:1345 sf_expl_l_@sf_expl_jls.f90:914 physiol_@physiol_jls.f90:503 sf_stom_@sf_stom_jls.f90:1003 bvoc_emissions_@bvoc_emissions.f90:227 _FWF@0x1d580e5 _sw_endrec@0x1d56cf1 _ferr@0x1d4499b abort@abort.c:92 raise@pt-raise.c:42
Where exactly is the disk quota being exceeded? I checked PUMA and can see that it's not there. But for monsoon, I am unsure if the problem is comig from /home, /projects/ukc-aed/kjamie, or somewhere else altogether?
Regards,
Jamie
Change History (2)
comment:1 Changed 3 years ago by willie
comment:2 Changed 3 years ago by willie
- Resolution set to fixed
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
Hi Jamie,
The problem is running out of disk space on /projects/ukca-ed/kjamie. You've used more than 1.9TB on this and the job xnvqc is taking 286GB. It is failing when writing to the pe_output, which in itself is taking about 200GB.
So the solution is to try to remove old runs that you don't need, or failing that to increase the quota. If the latter, it would be useful to make an estimate of the amount of space you need to complete your work.
Regards
Willie