Opened 8 months ago

Closed 8 months ago

#3064 closed help (fixed)

Disk quota exceeded but quotas look fine

Reported by: taubry Owned by: um_support
Component: UM Model Keywords: disk quota exceeded
Cc: Platform: ARCHER
UM Version: 11.2

Description

Dear NCAS CMS team,

Both of my runs running on ARCHER (u-bo142 and u-bm840) failed today after running through multiple cycle without problem. For both of them, the error is at atmos_main and is:

sys-122 : UNRECOVERABLE error on system request
Disk quota exceeded

I am well below my quota on /work on ARCHER. I am currently using 2Tb on /nerc and see no quota specified (I expect to produce 4 more Tb with these runs). I am running from pumatest where I use 1.1Gb which I think is below my quota. I had 3Gb on puma when my pumatest account was created, not sure how much I have on pumatest (it doesn't show it when I log in, whereas it used to with puma).

Many thanks for any advice.

Thomas

Change History (5)

comment:1 Changed 8 months ago by willie

Hi Thomas,

What do you get for the command

quota -v | awk 'END {printf " Using %6.2f%% of quota, %6.2f GB\n",100* ($2/$3), ($3/(1024*1024))}'

run on pumatest?

Willie

comment:2 Changed 8 months ago by taubry

Hi Willie,

Thanks, I get:
Using 37.12% of quota, 2.86 GB
So it looks like it's not a pumatest problem. Do I have a quota on /nerc that would not appear on ARCHER SAFE?

Thanks,

Thomas

comment:3 Changed 8 months ago by willie

Hi Thomas,

The error occurs in the atmos_main job.err file for cycle 19620701, so it is definitely a UM issue on ARCHER. It is a good idea to have a look in the pe_output file so see what it was doing. It looks like a system problem to me, but without access to the pe_output it is hard to tell. I see that your are re-running both jobs. If it happens again, don't modify anything and let us know.

Willie

comment:4 Changed 8 months ago by taubry

Hi Willie,

Sounds good! Yes I tried to re-trigger them as there was no obvious disk quota problem. I will leave them unouched if they fail again, and close the ticket if they go through the next cycle.

Thanks,

Thomas

comment:5 Changed 8 months ago by taubry

  • Resolution set to fixed
  • Status changed from new to closed

It's running fine now, and Grenville sent an email since then saying multiple jobs failed because of too many files on /work, so I guess that was the issue. Thanks!

Thomas

Note: See TracTickets for help on using tickets.