Opened 2 months ago

Closed 7 weeks ago

#2484 closed help (fixed)

cray-netcdf(3):ERROR:105: Unable to locate a modulefile for 'cray-netcdf/4.4.0'

Reported by: laurahb Owned by: um_support
Priority: normal Component: UM Model
Keywords: Cc:
Platform: ARCHER UM Version:

Description

Hi,
I'm getting the above error message when I try to run my suite (u-ay059). I've found a couple of tickets with the same issue and they were advised to change this to
cray-netcdf/4.4.1.1
cray-hdf5/1.10.0.1
in the suite.rc file. However my suite.rc file doesn't have cray-netcdf/4.4.0 in it so I'm not sure where I should add these lines (I assumed I'd be replacing a similar line).

Change History (11)

comment:1 Changed 2 months ago by ros

Hi Laura,

Your suite is set up differently to the examples you found (yours is now the norm). All Archer specific settings are in the site/archer.rc file.

Cheers,
Ros.

comment:2 Changed 2 months ago by laurahb

Thanks, that seems to have fixed it.
However, I'm now getting a problem that it's timing out in the atmos_main step. I tried increasing the wallclock time from 2 to 3 hours but it still timed out. Does that mean it's getting stuck somewhere? Presumably since it's a standard job it should run within the specified time?

http://puma.nerc.ac.uk/rose-bush/view/laurahb/u-ay059?&no_fuzzy_time=0&path=log/job/19810901T0000Z/atmos_main/01/job.err

comment:3 Changed 2 months ago by willie

  • Platform set to ARCHER

Hi Laura,

It has run out of disk space on ARCHER:

sys-122 : UNRECOVERABLE error on system request 
  Disk quota exceeded

Encountered during an I/O operation on unit 6
Fortran unit 6 is connected to a sequential formatted text file:
  "pe_output/ay059.fort6.pe057"

So if you can delete some files you can continue your run. Longer term you may need to ask for an increase in quota.

Regards
Willie

comment:4 Changed 2 months ago by laurahb

Hi,
Thanks, this was because I increased the wallclock time further after submitting that ticket, and it ran (so presumably it did just need more time), but then I got the disk quota issue. As this is the first thing I've run, I have basically no files stored on ARCHER: is it meant to put the output in /home/n02/n02/lbaker/cylc-run/u-ay059/ or is there somewhere else that I should set it to go that has more space?

comment:5 Changed 2 months ago by willie

Hi Laura,

The output of the compute nodes will end up in /work/n02/n02/lbaker/u-ay059. Rose will pull back a subset of this to your PUMA cylc_run directory to support diagnostics.

Grenville and Ros are currently away at meetings but will be able to increase your ARCHER quota if necessary.

In the mean time could you

chmod g+rX /work/n02/n02/lbaker
chmod g+rX /home/n02/n02/lbaker

This will help us if there are future issues.

Also please check your PUMA quota: use quota -v

Regards
Willie

comment:6 Changed 2 months ago by willie

Sorry that should be

chmod -R g+rX /work/n02/n02/lbaker
chmod -R g+rX /home/n02/n02/lbaker

Willie

comment:7 Changed 2 months ago by ros

Hi Laura,

I can sort out your ARCHR quota - you currently have very little. How much space are you likely to need?

Regards,
Ros.

comment:8 Changed 2 months ago by laurahb

Hi Ros,
Thanks for offering to increase my quota.
I don't have a very good sense of how much space I'll need… I plan to run (initially) 100 10-month simulations and will need monthly fields for all months and daily fields for 3-4 months of this (daily fields I need should just be surface fields). You probably have more of an idea than me of how much space this would need. If that's a lot then I can run in smaller batches and move stuff locally before running the next batch.
Thanks,
Laura

comment:9 Changed 8 weeks ago by laurahb

Hi,
Just following up on this… please let me know if the information above is not enough and I can try to do some estimations of how much space I'm likely to need.

Laura

comment:10 Changed 8 weeks ago by grenville

Laura

I have increased you quota to 1TB - let's see how you get on.

Grenville

comment:11 Changed 7 weeks ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.