Opened 9 months ago

Closed 9 months ago

#2707 closed help (answered)

Disk Quota Exceeded while running tutorial jobs

Reported by: jjas3 Owned by: grenville
Component: Disk Space Keywords:
Cc: Platform:
UM Version:

Description

Hi CMS,

I am running the UM job u-be064, a copy of u-ag761 from the UM tutorials from this online resource:

http://cms.ncas.ac.uk/documents/training/December2017/UM_practicals/solving-problems.html

In the final section (i.e. where it is supposed to run smoothly) I get an error from atmos-main saying I have run out of disk space. I think I have identified this to be my work/n02/n02/jjas3 on Archer, where I have 10 GB of space. Can this be extended?

Thanks

Here is the error from the job stderr output file:

sys-122 : UNRECOVERABLE error on system request

Disk quota exceeded

Encountered during an I/O operation on unit 6
Fortran unit 6 is connected to a sequential formatted text file:

"pe_output/be064.fort6.pe11"

Application 33050519 is crashing. ATP analysis proceeding…

ATP Stack walkback for Rank 11 starting:

_start@…:113
libc_start_main@…:242
main@…:19
main@…:19
um_shell_@…:641
u_model_4a_@…:349
atm_step_4a_@…:1568
umprintflush$umprintmgr_@…:394
umflush$umflush_mod_@…:49
_FLUSH@0x318b33a
_ferr@0x319f6ab
abort@…:92
raise@…:42

ATP Stack walkback for Rank 11 done
Process died with signal 6: 'Aborted'
Forcing core dumps of ranks 11, 0
View application merged backtrace tree with: stat-view atpMergedBT.dot
You may need to: module load stat

_pmiu_daemon(SIGCHLD): [NID 03988] [c4-2c2s5n0] [Mon Dec 17 18:35:03 2018] PE RANK 24 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 04196] [c5-2c2s9n0] [Mon Dec 17 18:35:03 2018] PE RANK 36 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 04387] [c6-2c2s8n3] [Mon Dec 17 18:35:03 2018] PE RANK 48 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 04836] [c1-3c0s9n0] [Mon Dec 17 18:35:03 2018] PE RANK 78 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 03847] [c4-2c0s1n3] [Mon Dec 17 18:35:03 2018] PE RANK 12 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 04835] [c1-3c0s8n3] [Mon Dec 17 18:35:03 2018] PE RANK 60 exit signal Killed
[NID 03988] 2018-12-17 18:35:03 Apid 33050519: initiated application termination
[FAIL] um-atmos # return-code=137
Received signal ERR
cylc (scheduler - 2018-12-17T18:35:06Z): CRITICAL Task job script received signal ERR at 2018-12-17T18:35:06Z
cylc (scheduler - 2018-12-17T18:35:06Z): CRITICAL failed at 2018-12-17T18:35:0

many thanks!

Change History (2)

comment:1 Changed 9 months ago by grenville

Increased to 100GB - it may take a short time to be usable.

Grenville

comment:2 Changed 9 months ago by grenville

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.