Opened 5 years ago

Closed 5 years ago

#1506 closed help (fixed)

crun issues

Reported by: s1374103 Owned by: ros
Component: UM Model Keywords:
Cc: Platform:
UM Version: 7.3

Description

Dear Helpdesk,

I have copied job xjnjn (vn7.3 with extended nitrate) and am trying to get the model to run to completion (1 year 4 months) and am experiencing difficulties.

I have used ticket #1473 and carried out, what I thought was, a nrun and crun but each time I only generate .pm files.

Can I clarify the process of nruns and cruns please.


For the nrun

  • model selection → compilation and modifications → compile options for UM model

button set to compile and build the executable named below, then run

  • model selection → compilation and modifications → compile options for UM reconfiguration

button set to compile and build the executable named below


save, process, and submit.


For the crun

  • model selection → compilation and modifications → compile options for UM model

button set to run from existing executable named below

  • model selection → compilation and modifications → compile options for UM reconfiguration

button set to run from existing executable named below

  • Crun hand edit (~ros/HadGEM3-A/vn7.3/HGPKG1/crun.ed) turned on

Set to ‘Y’


Save, process and submit.


Is this the correct procedure?

I have tried multiple combinations of model setup and each time I have only one .pm file.

The job time was originally set to 3 hours, which is long enough for the nrun to complete. Do I need to increase the time limit before I submit for crun? If so, scaling should make it 48 hours and when entering this into the umui it returns something like, 'job time limit exceeded'. How do I get around this?

Many thanks,

Jamie

Change History (9)

comment:1 Changed 5 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted
  • UM Version changed from <select version> to 7.3

Hi Jamie,

Yes, this is the correct procedure. If you let me know the job id I can help better. I can see you have submitted xldpg (Is this the job?) for an NRUN for 1 month which has completed successfully and in your $DATADIR the number of files output looks fine to me. You have 3 10day dumps and a .pm file, of which I would only expect 1 per month. Your resubmission period is 1 month so if the NRUN completes in 3 hours each CRUN should also complete in 3 hours. The set up of this job looks fine. What happens when you submit the CRUN?

Cheers,
Ros.

comment:2 Changed 5 years ago by s1374103

Hi Ros,

OK. So if the time resubmission period for the crun is the same length of time as the nrun then the time limit should be the same?

xldpg is a job with reduced run time I was using to test the submission. I am now using xldph and have just submitted it for nrun and will get back to you on how it goes.

Thanks,

Jamie

comment:3 Changed 5 years ago by s1374103

Hi Ros,

The job (xlph) run failed due to the disk quota being exceeded. I've check on Archer and Puma and they both appear to be within their limits.

aprun: Apid 13232875: Write failure to stdout of 1874 bytes, ret -1: Disk quota exceeded
aprun: Apid 13232875: Exiting due to errors. Application aborted
/work/n02/n02/jimbo/um/xldph/bin/qsexecute[1091]: echo: write to 1 failed [Disk quota exceeded]
diff: /work/n02/n02/jimbo/tmp/tmp.mom5.15150/xldph.xhist: No such file or directory
qsexecute: Copying /work/n02/n02/jimbo/um/xldph/xldph.thist to backup thist file /work/n02/n02/jimbo/um/xl
dph/xldph.thist_keep
xldph: Run failed

Is this due to a network problem or is there something I can do?

Regards,

Jamie

comment:4 Changed 5 years ago by ros

HI Jamie,

You are not far off your /work quota. Are there some files on /work that you can delete or move to another system? You have ~413Gb there currently.

If you really do need more disk space this may be possible.

REgards,
Ros.

comment:5 Changed 5 years ago by s1374103

Hi Ros,

Is there a difference between my /work and /home quotas?

When I do 'quota -s' is that my quota for both work and home?

Regards,

Jamie

comment:6 Changed 5 years ago by ros

Hi Jamie,

Yes, you have a different quota for /home & /work. I think quota -s just shows your /home quota. The man page does imply that you should be able to get it show quota from a different filesystem, but I can't make it work. You can also see your usage & quotas in the ARCHER safe pages. http://www.archer.ac.uk/safe

Regards,
Ros.

comment:7 Changed 5 years ago by s1374103

Hi Ros,

OK, thanks.

If I was using 413/500 Gb, would this be reason enough for my job to fail through disk space issues? Does Archer anticipate that the output of the run will be too large or would it not run the job and fill up my disk quota until it exceeds?

Regards,

Jamie

comment:8 Changed 5 years ago by ros

Hi Jamie,

Your quota is 420Gb and apparently what's shown in SAFE can be up to 5hours behind. You can run

lfs quota /fs2

to show your up-to-date usage and quota on /work.

I see you have now deleted a lot of files so I presume your job is now running ok?

Regards,
Ros.

comment:9 Changed 5 years ago by annette

  • Resolution set to fixed
  • Status changed from accepted to closed

Hi Jamie,

I assume this fixed your problem, so I'm closing the ticket.

Best regards,
Annette

Note: See TracTickets for help on using tickets.