Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1220 closed help (fixed)

running UM on ARCHER

Reported by: anmcr Owned by: um_support
Component: UM Model Keywords: running UM
Cc: Platform: ARCHER
UM Version: 7.3

Description

Good morning,

I submitted a vn7.3 UM job to ARCHER yesterday for the first time (identifier xhldo).

However, it has been stuck in the queue for almost 24 hours.

I believe I have made the correct changes for running on ARCHER
(i.e. as suggested by http://cms.ncas.ac.uk/wiki/Archer).

I would be very grateful for any help.

Best wishes,

Andrew

Change History (19)

comment:1 Changed 6 years ago by grenville

Andrew

Please make your ARCHER home and work spaces readable.

Grenville

comment:2 Changed 6 years ago by anmcr

Dear Grenville,

I've changed the permissions.

I also raised a ticket yesterday that I had run out of disk space on ARCHER. So perhaps this is connected? But the fact the job was simply queuing for such a length of time made me suspect something else was happening.

Thanks for your help,

Andrew

comment:3 Changed 6 years ago by grenville

Andrew

Please delete the queued job and resubmit.

Grenville

comment:4 Changed 6 years ago by anmcr

Hi Grenville,

I deleted the job and resubmitted. However, I get a 'model build: failed' error during the compilation (/home/n02/n02/anmcr/output). I looked through the output file but was unable to identify the problem.

Andrew

comment:5 Changed 6 years ago by anmcr

Hi Grenville,

I looked through some of the other jobs submitted to ARCHER and a number of them had included the hand edit '~ros/umui_jobs/hand_edits/archer/cce-7.3.ed'. It's not much better than tinkering, however I tried to run the model with this included. However, now I got a 'permission denied' error when I tried to submit. I also get the same error with /work/n02/n02/anmcr on ARCHER. Not quite sure why …

Thanks,

Andrew

comment:6 Changed 6 years ago by willie

Hi Andrew,

You could switch off the hand edit

~ros/HadGEM3-A/vn7.3/HGPKG1/ibm_modules.scr

this might be confusing things.

Regards

Willie

comment:7 Changed 6 years ago by ros

Hi Andrew,

For some reason your job is trying to link to the wrong version of GCOM. I've investigate and get back to you.

Cheers,
Ros.

comment:8 Changed 6 years ago by ros

Hi Andrew,

Sorry, I just realised I mis-read one of your previous emails and read override file for hand-edit! :-(

You need to change the User Override files from

/home/umui/overrides/hector_cce_7.3_machine

to

/home/umui/overrides/archer_cce_7.3_machine

And similarly change the File override to archer_cce_7.3_file

You don't need my hand-edit ~ros/umui_jobs/hand_edits/archer/cce-7.3.ed (In actual fact I deleted it a while ago hence the permission denied that you got).

Hopefully that will fix your problem.

Cheers,
Ros.

comment:9 Changed 6 years ago by anmcr

Dear Ros and Willie,

Thank you for your help. I've made some progress. The changes I made were

1) Changed the User machine overrides to '/home/umui/overrides/archer_cce_7.3_machine'.

However, what do I do regarding the 'User file overrides' which are still being used (/home/umui/overrides/hector_cce_7.3_file and /home/ukca/comp_overrides_hector_cce_7.3_asad_chem_diags)?

2) Removed the hand edit '~ros/HadGEM3-A/vn7.3/HGPKG1/ibm_modules.scr'

3) Removed the hand edit '~ros/umui_jobs/hand_edits/archer/cce-7.3.ed'

However, the reconfiguration fails with the error:

User Prognostic File does not exist.
File : /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_mean ERROR!!! in reconfiguration in routine Rcf_Create_Dump

This file/path is on HECToR. Unfortunately it must be given by its full path as I have changed all the environment variables to point to ARCHER. Could you please tell me which UMUI panel this could be found, so that I can point it to a file on ARCHER?

Thank you,

Andrew

comment:10 Changed 6 years ago by ros

Hi Andrew,

  • You need to change the file override to archer_cce_7.3_file
  • Leave the /home/ukca/comp_overides/hector_cce_7.3_asad_chem_diags as is. We are still using the same compiler and having taken a look at this file it will work just fine.
  • The paths to the user prognostic files are set in UMUI window Atmos → STASH → Initialisation of User Prognostics

Cheers,
Ros.

comment:11 Changed 6 years ago by anmcr

Dear Ros,

I've made these changes. The reconfiguration seems successful now. However, when I run the model executable it runs for a few time steps and then fails with the error below. Unfortunately I was unable to get to the bottom of it. Can you able to advise?

Many thanks,

Andrew

*
UM Executable : /work/n02/n02/anmcr/xhldo/bin/xhldo.exe
*

sys-2 : UNRECOVERABLE error on system request

No such file or directory

Encountered during an OPEN of unit 99
Fortran unit 99 is not connected

comment:12 Changed 6 years ago by anmcr

Hi again,

I don't know what file 'unit 99' refers to. I've seem similar errors in ticket #771 and #778. Check setup is complaining about broken codes in the USERLIST_A (user-stashmaster files), which #778 suggests could be relevant.

Andrew

comment:13 Changed 6 years ago by anmcr

Hi again,

The reference to 'USERLIST_A' when Check Setup is run is discussed in #957. Willie said there that it was not an error, simply a nuisance.

Unfortunately, I'm still no closer to identifying what file 'unit 90' refers to.

Andrew

comment:14 Changed 6 years ago by anmcr

Hi again,

I differenced my job (xhldo) with a standard UKCA job on ARCHER (xfvgd).

The only significant differences that I could see were:

1) hand edits ('N' in xhldo and 'Y' in xfvgd)

~ros/HadGEM3-A/vn7.3/HGPKG1/ibm_modules.scr

2) branches (both 'N' in xhldo and 'Y' in xfvgd)

fcm:um_br/dev/Annette/VN7.3_improve_gbl_comms_ukmo/src
fcm:um_br/dev/jeff/VN7.3_HadGEM3-A_r2.0_hector_monsson_archiving/src

I changed these to 'Y' but the error still remains, so I guess whatever file 99 is it is on HECToR but has not been copied to ARCHER.

Andrew

~

comment:15 Changed 6 years ago by anmcr

I think this is the problem.

A file with 'unit 99' is being opened in ukca_read_era.F90.

It's location is on HECToR at /work/n02/n02/jmk64/tbias_data/era_2000.asc.

This is in the branch fcm:um_br/dev/jmk64/VN7.3_UKCA_CheM_PSC_production_run/src.

Andrew

comment:16 Changed 6 years ago by ros

Hi Andrew,

Sounds like you have tracked down the problem. I was just beginning to investigate. James doesn't yet have an account on ARCHER as far as I can tell, so you will have to copy over the era_2000.asc file to ARCHER and then create yourself a new branch. Merge James' changes in and then modify the path to this file. If you need any help with this let me know.

Cheers,
Ros.

comment:17 Changed 6 years ago by anmcr

Hi Ros,

I've got this to run now, so please close this ticket.

Thanks very much for your help.

Andrew

comment:18 Changed 6 years ago by ros

  • Component changed from Other to UM Model
  • Resolution set to fixed
  • Status changed from new to closed

comment:19 Changed 6 years ago by annette

Hi Andrew and Ros,

Just to say that everything that was under /work/n02/n02/annette/hadgem3/ on HECToR is now in the same location on Archer.

Annette

Note: See TracTickets for help on using tickets.