Opened 2 months ago

Last modified 6 days ago

#2291 accepted help

HadGEM3-GC3.1 on Archer

Reported by: mvguarino Owned by: ros
Priority: normal Component: ARCHER
Keywords: Cc: lsim@…
Platform: ARCHER UM Version:

Description

Hello,

My name is Maria Vittoria Guarino, I work at the British Antarctic Survey (BAS) in Cambridge on a NERC funded project, in collaboration with the Met Office, whose main PI is Dr Louise Sime (Project reference: NE/P013279/1).

I will be running HadGEM3-GC3.1 simulations, at N96 resolution, using the CMIP6 piControl run (which should be ready to use in about two weeks).
Can this model (the CMIP6 version of GC3.1) be run on Archer? If so, could you advise me on the steps to follow?

Many thanks,

Vittoria

Change History (22)

comment:1 Changed 2 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted
  • UM Version <select version> deleted

Hi Vittoria,

Once the suite you require is ready, please let us know the suite id and then we can advise on the best path to take.

Regards,
Ros.

comment:2 Changed 5 weeks ago by mvguarino

Hi Ros,

The PI control is ready and the suite number is u-ar766 (https://code.metoffice.gov.uk/trac/ukcmip6/wiki/runs/u-ar766)

Thanks,

Vittoria

comment:3 Changed 5 weeks ago by ros

Hi Vittoria,

This suite is already on our list of standard ones to port to ARCHER. I will let you know when it is ready.

Cheers,
Ros.

comment:4 Changed 5 weeks ago by mvguarino

That's great, thank you.

Maybe in the meantime you could help me with this (?):

I am trying to create a branch of an existing branch using this command line:

fcm bc -t pkg lig127k_CMIP6 fcm:um.x_br/pkg/Share/vn10.7_CMIP6_production_mods

However this is the error I get:

[FAIL] svn info —xml https://code.metoffice.gov.uk/svn/um/main/branches/pkg/Share/vn10.7_CMIP6_production_mods # rc=1
[FAIL] svn: E215004: Authentication failed and interactive prompting is disabled; see the —force-interactive option
[FAIL] svn: E215004: Unable to connect to a repository at URL 'https://code.metoffice.gov.uk/svn/um/main/branches/pkg/Share/vn10.7_CMIP6_production_mods'
[FAIL] svn: E215004: No more credentials or we tried too many times.
[FAIL] Authentication failed

What am I doing wrong?

Thank you.

Vittoria

Last edited 5 weeks ago by mvguarino (previous) (diff)

comment:5 follow-up: Changed 4 weeks ago by ros

Hi Vittoria,

Have you cached your MOSRS password on PUMA? Your password must be cached in order run rose and fcm commands.

Instructions for getting setup (MOSRS password caching and ssh-agent) can be found in our training document here: http://cms.ncas.ac.uk/documents/training/March2017/UM_practicals/getting-setup.html

Regarding the branch you are trying to create. I presume you are going to be changing code that was modified/added in the vn10.7_CMIP6_production_mods branch?

If NOT then you can just create a new branch, make your changes and then include both the vn10.7_CMIP6_production_mods branch and your new branch in the UM suite.

If the answer is YES, then you need to create a new branch and then merge in the changes from the vn10.7_CMIP6_production_mods branch. Creating a branch of a branch is not recommended as it can lead to unpredictable results. Your development branch needs to under the /dev (default) area of the repository and not /pkg which is a reserved area.

To create your branch run:

fcm bc lig127k_CMIP6 fcm:um.x-tr@vn10.7

Checkout a working copy with:

fcm co fcm:um.x-br/dev/mariavittoriaguarino/vn10.7_lig127k_CMIP6

Then if you need to merge in the CMIP6 branch. cd to working copy directory created in step above and run:

fcm merge fcm:um.x_br/pkg/Share/vn10.7_CMIP6_production_mods

fcm commit

Cheers,
Ros.

comment:6 in reply to: ↑ 5 Changed 4 weeks ago by mvguarino

Hi Ros,

By mistake, I must have commented the last line of my .kshrc file and the password wasn't cached.

I essentially need to run simulations using the same source code of vn10.7_CMIP6_production_mods except for some modifications that I will make to the planet_constants_mod.f90 and astro_constants_mod.f90 modules.

I created my own branch from the trunk and then I merged the changes from vn10.7_CMIP6_production_mods into my working copy.

Thank you,

Vittoria

comment:7 Changed 4 weeks ago by ros

Hi Vittoria,

I have now ported suite u-ar766 to ARCHER as suite id u-as037. You can now take a copy of u-as037 and change username, ARCHER account code, etc before running.

Cheers,
Ros.

comment:8 follow-up: Changed 4 weeks ago by ros

P.S. Just to add this suite does not have ARCHER post-processing/archiving in it. I assume as you are doing development you won't need this (at least not yet). If you do eventually need it please let us know and it can be added - this is not trivial at the current time.

Last edited 4 weeks ago by ros (previous) (diff)

comment:9 Changed 4 weeks ago by mvguarino

Hi Ros,

Thanks for porting the suite.

I had a quick look at it and there seems to be a problem with the "ios_offset" entry (UM/namelist/IO system settings), at the moment ios_offset = 1278, but this value seems to be out of range.
Should I just decrease it to 1024? What this would imply?

Thanks,

Vittoria

comment:10 in reply to: ↑ 8 Changed 4 weeks ago by mvguarino

Replying to ros:

P.S. Just to add this suite does not have ARCHER post-processing/archiving in it. I assume as you are doing development you won't need this (at least not yet). If you do eventually need it please let us know and it can be added - this is not trivial at the current time.

I do actually need this. Would you be able to tell me how long would it take? (if possible at all)

comment:11 Changed 4 weeks ago by ros

Hi Vittoria,

UM/namelist/IO system settings is for tuning the I/O server, which I just left as per the Met Office, but a more sensible set of values for ARCHER would be:

ios_tasks_per_server: 6
ios_spacing: 12
ios_offset: 0

As for the archiving, I'm in the process of working with the Met Office to get the ARCHER stuff into the trunk which will be released soon (timescale currently unknown but needed for CMIP6). I'm trying to avoid spending time upgrading suites twice and/or supporting multiple versions of Archer postprocessing.

Once you've got your suite working and ready for production, please let me know and we can then decide how to proceed.

Regards,
Ros.

comment:12 Changed 2 weeks ago by mvguarino

Hi Ros,

Although post-processing is not ready yet, I am trying to run the suite u-as245. This is just a copy of u-as037 with the more appropriate IO settings you suggested.

Something seems to be wrong with my ARCHER .profile file, as I get this error:

/home/n02/n02/vittoria/.profile: line 37:
: command not found
/home/n02/n02/vittoria/.profile: line 38:
: command not found
/home/n02/n02/vittoria/.profile: line 41:
: command not found

/home/n02/n02/vittoria/.profile: line 48: syntax error near unexpected token ‘fi'
/home/n02/n02/vittoria/.profile: line 48: ‘fi

bash: rose: command not found

I set the ARCHER environment as described here: http://cms.ncas.ac.uk/wiki/RoseCylc.

Thanks,

Vittoria

comment:13 follow-up: Changed 2 weeks ago by ros

Hi Vittoria,

Can you please run the following on ARCHER to give us read-access to your files?

chmod -R g+rX /home/n02/n02/vittoria
chmod -R g+rX /work/n02/n02/vittoria

Cheers,
Ros.

comment:14 in reply to: ↑ 13 Changed 2 weeks ago by mvguarino

Replying to ros:

Done it.

Hi Vittoria,

Can you please run the following on ARCHER to give us read-access to your files?

chmod -R g+rX /home/n02/n02/vittoria
chmod -R g+rX /work/n02/n02/vittoria

Cheers,
Ros.

comment:15 Changed 2 weeks ago by ros

Hi Vittoria,

Did you edit your ~/.profile on or copy the lines from windows based system as it contains a lot windows control characters ^M (carriage returns) at the end of each line you've added. These are causing the problem.

However, you've reminded me we need to update the information on that page. The most up-to-date set up for running UM Rose suites can be found on our training page:
http://cms.ncas.ac.uk/documents/training/March2017/UM_practicals/getting-setup.html

You may have already done a lot of this, section 1.5 details how to setup your archer environment:

http://cms.ncas.ac.uk/documents/training/March2017/UM_practicals/getting-setup.html#set-up-your-archer-environment

Regards,
Ros.

comment:16 Changed 2 weeks ago by mvguarino

That worked, thanks.

However, the suite still doesn't run:

1) validate_suite_info fails, the error is:

/bin/sh: python2.7: command not found
[FAIL] python2.7 $CYLC_SUITE_RUN_DIR/bin/validate_suite_info.py $CYLC_SUITE_RUN_DIR # return-code=127
Received signal ERR
cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL Task job script received signal ERR at 2017-11-24T15:15:32Z
cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL failed at 2017-11-24T15:15:32Z

2) after that also fcm_make_um, fcm_make_ocean and fcm_make_drivers fail, for all of them the error message is the same:

/home/mvguarino/cylc-run/u-as245/log/job/18500101T0000Z/fcm_make_ocean/01/job: line 155: SCRATCH: unbound variable
Received signal EXIT
cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL Task job script received signal EXIT at 2017-11-24T15:15:32Z
cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL failed at 2017-11-24T15:15:32Z

Vittoria

comment:17 Changed 2 weeks ago by ros

Hi Vittoria,

1) My fault there is an incorrect path (I've updated the original suite):

In suite.rc change:

    [[validate_suite_info]]
        pre-script = "export PATH=/home/andy/Enthought/Canopy_64bit/User/bin/python:$PATH”

to be

    [[validate_suite_info]]
        pre-script = "export PATH=/home/andy/Enthought/Canopy_64bit/User/bin:$PATH”

2) I've just created you a SCRATCH directory please add the line:

export SCRATCH=/export/puma/data-01/mvguarino

to your ~/.profile on PUMA.

Cheers,
Ros.

Last edited 2 weeks ago by ros (previous) (diff)

comment:18 Changed 2 weeks ago by mvguarino

Great, suite seems to be running (for now.. ).

Thank you,

Vittoria

comment:19 Changed 2 weeks ago by mvguarino

Hi Ros,

I need your help once more.
The coupling failed, the job.err file is: http://puma.nerc.ac.uk/rose-bush/view/mvguarino/u-as245?&no_fuzzy_time=0&path=log/job/18500101T0000Z/coupled/03/job.err

The error seems to be:

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 1
?  Error from routine: io:file_open
?  Error message: Failed to open file /work/n02/n02/vittoria/cylc-run/u-as245/share/data/as245.astart
?  Error from processor: 0
?  Error number: 11
????????????????????????????????????????????????????????????????????????????????

Indeed, the as245.astart file hasn't been created.

What should I do to solve the problem?

Thank you very much,

Vittoria

comment:20 Changed 2 weeks ago by ros

Hi Vittoria,

You need to turn on reconfiguration to produce the start dump. Go to panel suite conf → Build and Run and turn on "Run reconfiguration".

Regards,
Ros.

comment:21 Changed 10 days ago by mvguarino

Hi Ros,

My test run was successful and I am now ready to run the suite u-as245 for production.
Please let me know when post-processing will be ready.

Thank you,

Vittoria

comment:22 Changed 6 days ago by ros

Hi Vittoria,

We're hoping that post-processing will be released at the end of this week.

Cheers,
Ros.

Note: See TracTickets for help on using tickets.