Opened 3 years ago
Closed 3 years ago
#2291 closed help (completed)
HadGEM3-GC3.1 on Archer
Reported by: | mvguarino | Owned by: | ros |
---|---|---|---|
Component: | ARCHER | Keywords: | |
Cc: | lsim@… | Platform: | ARCHER |
UM Version: | 10.7 |
Description
Hello,
My name is Maria Vittoria Guarino, I work at the British Antarctic Survey (BAS) in Cambridge on a NERC funded project, in collaboration with the Met Office, whose main PI is Dr Louise Sime (Project reference: NE/P013279/1).
I will be running HadGEM3-GC3.1 simulations, at N96 resolution, using the CMIP6 piControl run (which should be ready to use in about two weeks).
Can this model (the CMIP6 version of GC3.1) be run on Archer? If so, could you advise me on the steps to follow?
Many thanks,
Vittoria
Change History (24)
comment:1 Changed 3 years ago by ros
- Owner changed from um_support to ros
- Status changed from new to accepted
- UM Version <select version> deleted
comment:2 Changed 3 years ago by mvguarino
Hi Ros,
The PI control is ready and the suite number is u-ar766 (https://code.metoffice.gov.uk/trac/ukcmip6/wiki/runs/u-ar766)
Thanks,
Vittoria
comment:3 Changed 3 years ago by ros
Hi Vittoria,
This suite is already on our list of standard ones to port to ARCHER. I will let you know when it is ready.
Cheers,
Ros.
comment:4 Changed 3 years ago by mvguarino
That's great, thank you.
Maybe in the meantime you could help me with this :
I am trying to create a branch of an existing branch using this command line:
fcm bc -t pkg lig127k_CMIP6 fcm:um.x_br/pkg/Share/vn10.7_CMIP6_production_mods
However this is the error I get:
[FAIL] svn info —xml https://code.metoffice.gov.uk/svn/um/main/branches/pkg/Share/vn10.7_CMIP6_production_mods # rc=1
[FAIL] svn: E215004: Authentication failed and interactive prompting is disabled; see the —force-interactive option
[FAIL] svn: E215004: Unable to connect to a repository at URL 'https://code.metoffice.gov.uk/svn/um/main/branches/pkg/Share/vn10.7_CMIP6_production_mods'
[FAIL] svn: E215004: No more credentials or we tried too many times.
[FAIL] Authentication failed
What am I doing wrong?
Thank you.
Vittoria
comment:5 follow-up: ↓ 6 Changed 3 years ago by ros
Hi Vittoria,
Have you cached your MOSRS password on PUMA? Your password must be cached in order run rose and fcm commands.
Instructions for getting setup (MOSRS password caching and ssh-agent) can be found in our training document here: http://cms.ncas.ac.uk/documents/training/March2017/UM_practicals/getting-setup.html
Regarding the branch you are trying to create. I presume you are going to be changing code that was modified/added in the vn10.7_CMIP6_production_mods branch?
If NOT then you can just create a new branch, make your changes and then include both the vn10.7_CMIP6_production_mods branch and your new branch in the UM suite.
If the answer is YES, then you need to create a new branch and then merge in the changes from the vn10.7_CMIP6_production_mods branch. Creating a branch of a branch is not recommended as it can lead to unpredictable results. Your development branch needs to under the /dev (default) area of the repository and not /pkg which is a reserved area.
To create your branch run:
fcm bc lig127k_CMIP6 fcm:um.x-tr@vn10.7
Checkout a working copy with:
fcm co fcm:um.x-br/dev/mariavittoriaguarino/vn10.7_lig127k_CMIP6
Then if you need to merge in the CMIP6 branch. cd to working copy directory created in step above and run:
fcm merge fcm:um.x_br/pkg/Share/vn10.7_CMIP6_production_mods
fcm commit
Cheers,
Ros.
comment:6 in reply to: ↑ 5 Changed 3 years ago by mvguarino
Hi Ros,
By mistake, I must have commented the last line of my .kshrc file and the password wasn't cached.
I essentially need to run simulations using the same source code of vn10.7_CMIP6_production_mods except for some modifications that I will make to the planet_constants_mod.f90 and astro_constants_mod.f90 modules.
I created my own branch from the trunk and then I merged the changes from vn10.7_CMIP6_production_mods into my working copy.
Thank you,
Vittoria
comment:7 Changed 3 years ago by ros
Hi Vittoria,
I have now ported suite u-ar766 to ARCHER as suite id u-as037. You can now take a copy of u-as037 and change username, ARCHER account code, etc before running.
Cheers,
Ros.
comment:8 follow-up: ↓ 10 Changed 3 years ago by ros
P.S. Just to add this suite does not have ARCHER post-processing/archiving in it. I assume as you are doing development you won't need this (at least not yet). If you do eventually need it please let us know and it can be added - this is not trivial at the current time.
comment:9 Changed 3 years ago by mvguarino
Hi Ros,
Thanks for porting the suite.
I had a quick look at it and there seems to be a problem with the "ios_offset" entry (UM/namelist/IO system settings), at the moment ios_offset = 1278, but this value seems to be out of range.
Should I just decrease it to 1024? What this would imply?
Thanks,
Vittoria
comment:10 in reply to: ↑ 8 Changed 3 years ago by mvguarino
Replying to ros:
P.S. Just to add this suite does not have ARCHER post-processing/archiving in it. I assume as you are doing development you won't need this (at least not yet). If you do eventually need it please let us know and it can be added - this is not trivial at the current time.
I do actually need this. Would you be able to tell me how long would it take? (if possible at all)
comment:11 Changed 3 years ago by ros
Hi Vittoria,
UM/namelist/IO system settings is for tuning the I/O server, which I just left as per the Met Office, but a more sensible set of values for ARCHER would be:
ios_tasks_per_server: 6
ios_spacing: 12
ios_offset: 0
As for the archiving, I'm in the process of working with the Met Office to get the ARCHER stuff into the trunk which will be released soon (timescale currently unknown but needed for CMIP6). I'm trying to avoid spending time upgrading suites twice and/or supporting multiple versions of Archer postprocessing.
Once you've got your suite working and ready for production, please let me know and we can then decide how to proceed.
Regards,
Ros.
comment:12 Changed 3 years ago by mvguarino
Hi Ros,
Although post-processing is not ready yet, I am trying to run the suite u-as245. This is just a copy of u-as037 with the more appropriate IO settings you suggested.
Something seems to be wrong with my ARCHER .profile file, as I get this error:
/home/n02/n02/vittoria/.profile: line 37:
: command not found
/home/n02/n02/vittoria/.profile: line 38:
: command not found
/home/n02/n02/vittoria/.profile: line 41:
: command not found
/home/n02/n02/vittoria/.profile: line 48: syntax error near unexpected token ‘fi'
/home/n02/n02/vittoria/.profile: line 48: ‘fi
bash: rose: command not found
I set the ARCHER environment as described here: http://cms.ncas.ac.uk/wiki/RoseCylc.
Thanks,
Vittoria
comment:13 follow-up: ↓ 14 Changed 3 years ago by ros
Hi Vittoria,
Can you please run the following on ARCHER to give us read-access to your files?
chmod -R g+rX /home/n02/n02/vittoria chmod -R g+rX /work/n02/n02/vittoria
Cheers,
Ros.
comment:14 in reply to: ↑ 13 Changed 3 years ago by mvguarino
Replying to ros:
Done it.
Hi Vittoria,
Can you please run the following on ARCHER to give us read-access to your files?
chmod -R g+rX /home/n02/n02/vittoria chmod -R g+rX /work/n02/n02/vittoriaCheers,
Ros.
comment:15 Changed 3 years ago by ros
Hi Vittoria,
Did you edit your ~/.profile on or copy the lines from windows based system as it contains a lot windows control characters ^M (carriage returns) at the end of each line you've added. These are causing the problem.
However, you've reminded me we need to update the information on that page. The most up-to-date set up for running UM Rose suites can be found on our training page:
http://cms.ncas.ac.uk/documents/training/March2017/UM_practicals/getting-setup.html
You may have already done a lot of this, section 1.5 details how to setup your archer environment:
Regards,
Ros.
comment:16 Changed 3 years ago by mvguarino
That worked, thanks.
However, the suite still doesn't run:
1) validate_suite_info fails, the error is:
/bin/sh: python2.7: command not found [FAIL] python2.7 $CYLC_SUITE_RUN_DIR/bin/validate_suite_info.py $CYLC_SUITE_RUN_DIR # return-code=127 Received signal ERR cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL Task job script received signal ERR at 2017-11-24T15:15:32Z cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL failed at 2017-11-24T15:15:32Z
2) after that also fcm_make_um, fcm_make_ocean and fcm_make_drivers fail, for all of them the error message is the same:
/home/mvguarino/cylc-run/u-as245/log/job/18500101T0000Z/fcm_make_ocean/01/job: line 155: SCRATCH: unbound variable Received signal EXIT cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL Task job script received signal EXIT at 2017-11-24T15:15:32Z cylc (scheduler - 2017-11-24T15:15:32Z): CRITICAL failed at 2017-11-24T15:15:32Z
Vittoria
comment:17 Changed 3 years ago by ros
Hi Vittoria,
1) My fault there is an incorrect path (I've updated the original suite):
In suite.rc change:
[[validate_suite_info]] pre-script = "export PATH=/home/andy/Enthought/Canopy_64bit/User/bin/python:$PATH”
to be
[[validate_suite_info]] pre-script = "export PATH=/home/andy/Enthought/Canopy_64bit/User/bin:$PATH”
2) I've just created you a SCRATCH directory please add the line:
export SCRATCH=/export/puma/data-01/mvguarino
to your ~/.profile on PUMA.
Cheers,
Ros.
comment:18 Changed 3 years ago by mvguarino
Great, suite seems to be running (for now.. ).
Thank you,
Vittoria
comment:19 Changed 3 years ago by mvguarino
Hi Ros,
I need your help once more.
The coupling failed, the job.err file is: http://puma.nerc.ac.uk/rose-bush/view/mvguarino/u-as245?&no_fuzzy_time=0&path=log/job/18500101T0000Z/coupled/03/job.err
The error seems to be:
???????????????????????????????????????????????????????????????????????????????? ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!! ? Error code: 1 ? Error from routine: io:file_open ? Error message: Failed to open file /work/n02/n02/vittoria/cylc-run/u-as245/share/data/as245.astart ? Error from processor: 0 ? Error number: 11 ????????????????????????????????????????????????????????????????????????????????
Indeed, the as245.astart file hasn't been created.
What should I do to solve the problem?
Thank you very much,
Vittoria
comment:20 Changed 3 years ago by ros
Hi Vittoria,
You need to turn on reconfiguration to produce the start dump. Go to panel suite conf → Build and Run and turn on "Run reconfiguration".
Regards,
Ros.
comment:21 Changed 3 years ago by mvguarino
Hi Ros,
My test run was successful and I am now ready to run the suite u-as245 for production.
Please let me know when post-processing will be ready.
Thank you,
Vittoria
comment:22 Changed 3 years ago by ros
Hi Vittoria,
We're hoping that post-processing will be released at the end of this week.
Cheers,
Ros.
comment:23 Changed 3 years ago by ros
Hi Vittoria,
Sorry for the delay, I have now updated the instructions for how to upgrade the post-processing app to pick up Archer archiving:
http://cms.ncas.ac.uk/wiki/Docs/PostProcessingApp
If you have any problems let me know. If it helps you can also refer to suite u-as037 to see the changes that were made when I upgraded that suite.
Cheers,
Ros.
comment:24 Changed 3 years ago by ros
- Resolution set to completed
- Status changed from accepted to closed
- UM Version set to 10.7
I'm closing this ticket now - I believe you are now up and running with the post-processing app. Any further problems, please do open another ticket.
Cheers,
Ros.
Hi Vittoria,
Once the suite you require is ready, please let us know the suite id and then we can advise on the best path to take.
Regards,
Ros.