Opened 9 years ago

Closed 7 years ago

#688 closed help (fixed)

Running the SCM (v7.8)

Reported by: njharvey Owned by: ros
Component: UM Model Keywords: SCM
Cc: grenville Platform: Other
UM Version: 7.8

Description

Hi there,

I am trying to run the SCM (v7.8) on Hector and need some help!

I have a SCM job, also v7.8, from the Met Office. As it is from the Met Office there are a few things I need to update (in the UMUI) to it run on Hector but I don't know what they should be:

Target machine user-id [model selection/user information and submit method]

submission method

branch location [FCM configuration/FCM options for Atmosphere and Reconfiguration], the branch I need is in Trac - UM/branches/dev/Share/VN7.8_SCM_tracers_ukmo

I know that the UMUI can't run the SCM it can only create the executable files, so how do I go about running it on Hector? I have a namelist from the Met Office. I also know I need a startdump.

Sorry for the all the questions but I haven't used anything like this before. Any help you can give me would be great.

Thanks,

Natalie

Attachments (3)

Screenshot.png (64.3 KB) - added by njharvey 9 years ago.
What submission box displays before it freezes
Screenshot-2.png (55.2 KB) - added by njharvey 9 years ago.
What it looks like when it goes blank
fcm_gui.png (182.3 KB) - added by njharvey 8 years ago.

Download all attachments as: .zip

Change History (61)

comment:1 follow-ups: Changed 9 years ago by ros

Hi Natalie,

The settings in the UMUI you will definitely need to change are:

1) General Details - Enter your HECToR userid in target m/c userid. Set email address. Turn on "Override user's default Acct Grp", select "other" and enter your HECToR Account group.

2) Job Submission - "Define submission method" select qsub. Enter hostname "phase2b.hector.ac.uk"
Click the "Qsub" button at the bottom of the window and enter the Job time limit (e.g 3600 for 1 hour)

3) FCM Configuration → FCM Extract Directories:
Set paths for Local and Target root extract directories.
(We recommed that you set the Target root extract directory to a location in your /home directory on HECToR as compilation will be quicker than in /work)

4) FCM Configuration → FCM Options for UM Atmos and Recon:
The subversion URL: fcm:um_tr
Bindings Location: fcm:um_br/dev/um/VN7.8_machine_cfg/src/configs/bindings
Container file name: $UM_SVN_BIND/container.cfg@vn7.8_cfg

Add the following branches to the "User Modifications" Table:

fcm:um_br/pkg/Config/VN7.8_ncas/src and place a Y in the last column.
fcm:um_br/dev/Share/VN7.8_SCM_tracers_ukmo/src and place a Y in the last column.

Hopefully, that's everything you need to change to compile the job.
Click Save, then Process, then Submit to compile the executable.

I've never run the SCM so have no idea how it is run - I thought it was submitted via the UMUI, but you say not. Do the Met Office have a script that is used to run it? If so can we take a copy and modify it accordingly for HECToR and our submission system?

If you let us know the details of the start dump you require then Willie will hopefully be able to get hold of it for you.

Regards,
Ros.

comment:2 Changed 9 years ago by willie

Hi Natalie,

There is some advice at http://cms.ncas.ac.uk/index.php/mesoscale-modelling/running-the-um/1521-idealized-um

There is also a UM7.5 idealised job, xfqca, that runs on HECToR.

Regards,

Willie

comment:3 in reply to: ↑ 1 Changed 9 years ago by njharvey

Replying to ros:

Hi Natalie,

The settings in the UMUI you will definitely need to change are:

1) General Details - Enter your HECToR userid in target m/c userid. Set email address. Turn on "Override user's default Acct Grp", select "other" and enter your HECToR Account group.

2) Job Submission - "Define submission method" select qsub. Enter hostname "phase2b.hector.ac.uk"
Click the "Qsub" button at the bottom of the window and enter the Job time limit (e.g 3600 for 1 hour)

3) FCM Configuration → FCM Extract Directories:
Set paths for Local and Target root extract directories.
(We recommed that you set the Target root extract directory to a location in your /home directory on HECToR as compilation will be quicker than in /work)

4) FCM Configuration → FCM Options for UM Atmos and Recon:
The subversion URL: fcm:um_tr
Bindings Location: fcm:um_br/dev/um/VN7.8_machine_cfg/src/configs/bindings
Container file name: $UM_SVN_BIND/container.cfg@vn7.8_cfg

Add the following branches to the "User Modifications" Table:

fcm:um_br/pkg/Config/VN7.8_ncas/src and place a Y in the last column.
fcm:um_br/dev/Share/VN7.8_SCM_tracers_ukmo/src and place a Y in the last column.

Hopefully, that's everything you need to change to compile the job.
Click Save, then Process, then Submit to compile the executable.

I've never run the SCM so have no idea how it is run - I thought it was submitted via the UMUI, but you say not. Do the Met Office have a script that is used to run it? If so can we take a copy and modify it accordingly for HECToR and our submission system?

If you let us know the details of the start dump you require then Willie will hopefully be able to get hold of it for you.

Regards,
Ros.

comment:4 in reply to: ↑ 1 Changed 9 years ago by njharvey

Hi Ros and Willie,

I think I have made all the changes you suggested above but when I submit I get the following error message. Any idea what this might mean?

NDS_MAIN: Calling Extract …
create umbase
mkdir: cannot create directory `/SCM': Permission denied
Extracting BASE …
/home/njharvey/umui_jobs/xglfb/NDS_EXTR_SCR[75]: /SCM/njharvey/xglfb/umbase/ext.out: cannot create [No such file or directory]
BASE extract failed
NDS_MAIN: Extract failed
NDS_MAIN stopped with return code 1

Many thanks,

Natalie

Replying to ros:

Hi Natalie,

The settings in the UMUI you will definitely need to change are:

1) General Details - Enter your HECToR userid in target m/c userid. Set email address. Turn on "Override user's default Acct Grp", select "other" and enter your HECToR Account group.

2) Job Submission - "Define submission method" select qsub. Enter hostname "phase2b.hector.ac.uk"
Click the "Qsub" button at the bottom of the window and enter the Job time limit (e.g 3600 for 1 hour)

3) FCM Configuration → FCM Extract Directories:
Set paths for Local and Target root extract directories.
(We recommed that you set the Target root extract directory to a location in your /home directory on HECToR as compilation will be quicker than in /work)

4) FCM Configuration → FCM Options for UM Atmos and Recon:
The subversion URL: fcm:um_tr
Bindings Location: fcm:um_br/dev/um/VN7.8_machine_cfg/src/configs/bindings
Container file name: $UM_SVN_BIND/container.cfg@vn7.8_cfg

Add the following branches to the "User Modifications" Table:

fcm:um_br/pkg/Config/VN7.8_ncas/src and place a Y in the last column.
fcm:um_br/dev/Share/VN7.8_SCM_tracers_ukmo/src and place a Y in the last column.

Hopefully, that's everything you need to change to compile the job.
Click Save, then Process, then Submit to compile the executable.

I've never run the SCM so have no idea how it is run - I thought it was submitted via the UMUI, but you say not. Do the Met Office have a script that is used to run it? If so can we take a copy and modify it accordingly for HECToR and our submission system?

If you let us know the details of the start dump you require then Willie will hopefully be able to get hold of it for you.

Regards,
Ros.

comment:5 follow-up: Changed 9 years ago by ros

Hi Natalie,

In UMUI window FCM Configuration → FCM Extract Directories
You need to set the paths for Local and Target root extract directories. The variables $LOCALDATA and $USER aren't available on PUMA.

Local extract directory needs to be a path on PUMA - something like $HOME/SCM

and

Target extract directory, a path on HECTor, something like /home/n02/n02/<your hector userid>/SCM

Regards,
Ros.

comment:6 in reply to: ↑ 5 Changed 9 years ago by njharvey

Thanks Ros. I have now updated that but have a different error message!

NDS_MAIN: Calling Extract …
Extracting BASE …
BASE extract failed
NDS_MAIN: Extract failed
NDS_MAIN stopped with return code 25

Any ideas?

Many thanks for all your help and patience!

Natalie

Replying to ros:

Hi Natalie,

In UMUI window FCM Configuration → FCM Extract Directories
You need to set the paths for Local and Target root extract directories. The variables $LOCALDATA and $USER aren't available on PUMA.

Local extract directory needs to be a path on PUMA - something like $HOME/SCM

and

Target extract directory, a path on HECTor, something like /home/n02/n02/<your hector userid>/SCM

Regards,
Ros.

comment:7 follow-up: Changed 9 years ago by ros

Can you remove the 30289 from the Revision column in UMUI window FCM Configuration → FCM Opts for atmos and recon. That's a Met office revision number which doesn't exist for the VN7.8_ncas branch. I guess that got left over from the copy of the job from the Met Office.

If the extract fails again you will find output in
/home/njharvey/SCM/xglfb/umbase/ext.out (or /ummodel/ext.out or umrecon/ext.out) that might help you track down the problem yourself although the messages can be rather cryptic.

Regards,
Ros.

Changed 9 years ago by njharvey

What submission box displays before it freezes

Changed 9 years ago by njharvey

What it looks like when it goes blank

comment:8 in reply to: ↑ 7 Changed 9 years ago by njharvey

Hi again Ros,

Removing the revision number seems to have got rid of the return code 25 error. Now when I submit the submission box looks like attachment 1 above and then after a while it all goes blank, along with everything else in the umui (attachment 2). A folder is created on Hector but it doesn't run.

Many thanks,

Natalie

Replying to ros:

Can you remove the 30289 from the Revision column in UMUI window FCM Configuration → FCM Opts for atmos and recon. That's a Met office revision number which doesn't exist for the VN7.8_ncas branch. I guess that got left over from the copy of the job from the Met Office.

If the extract fails again you will find output in
/home/njharvey/SCM/xglfb/umbase/ext.out (or /ummodel/ext.out or umrecon/ext.out) that might help you track down the problem yourself although the messages can be rather cryptic.

Regards,
Ros.

comment:9 Changed 9 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Oops! Sorry, knew there would be something I'd forget to tell you. You need to have ssh-agent set up to allow login to HECToR without the need to enter a password. See instructions at: http://puma.nerc.ac.uk/trac/UM_TUTORIAL/wiki/Ros/sshAgent

Also make sure you have the following code in your $HOME/.profile on HECToR

# UM Specific set up
#-------------------

export UMDIR=/work/n02/n02/hum
TARGET_MC=pathscale_quad

# Setup UM variables
VN=7.8
if test -f $HOME/.umsetvars_$VN; then
  . $HOME/.umsetvars_$VN
else
  . $UMDIR/vn$VN/$TARGET_MC/scripts/.umsetvars_$VN
fi

loadcomp $TARGET_MC

Hopefully that will solve the remaining submission problems.

Regards,
Ros.

comment:10 Changed 9 years ago by njharvey

Hi again,

I have done what you suggested above and no longer get the hanging, which is great. I now have the following messsage

umui_runs/xglfb-265095556/SUBMIT[29]: .[31]: {{{: not found [No such file or directory]
PathScale? PrgEnv? loaded
xtpe-network-gemini
PrgEnv?-pathscale/3.1.49A
xt-mpt/5.3.0
pathscale/3.2.99
xtpe-mc12
umui_runs/xglfb-265095556/SUBMIT[29]: .[47]: }}}: not found [No such file or directory]

And have just got an email from pbs_adm@…

PBS Job Id: 382289.sdb
Job Name: xglfb_build
Aborted by PBS Server
Job cannot be executed
See job standard error file

Where can I find the standard error file?

Many thanks,

Natalie

comment:11 Changed 9 years ago by ros

Hi Natalie,

Output files are put in $HOME/um/umui_out by default on HECToR (named .comp.leave for compilation output and .leave for run output). If the job didn't even submit there may not be one.

Can you give me read permissions on both your /home and /work directories on HECToR please then I can take a look at the submission scripts.

chmod -R g+r /home/n02/n02/njharvey

and similarly for /work

THanks.

comment:12 Changed 9 years ago by njharvey

Hi Ros,

It is as you thought there are no .comp.leave or .leave files in $HOME/um/umui_out.

I think I have changed the permissions now.

Natalie

comment:13 Changed 9 years ago by ros

Argh. Sorry the directories need execute permission too.

chmod -R g+x /home/n02/n02/njharvey

comment:14 Changed 9 years ago by njharvey

All done.

comment:15 Changed 9 years ago by ros

I can't immediately see anything wrong with the scripts themselves, however, I did notice that you have an invalid account code specified in the UMUI. Your accounting code is n02-ncas, this needs to be entered in the UMUI panel User info → General details in the "Specify other account group" box instead of NCAS which you currently have.

Give that another try and I'll take a copy of your job and give it a try.

Regards,
Ros.

comment:16 Changed 9 years ago by njharvey

Hi Ros,

Just realise that I didn't update my .profile file properly. It appears that I now have a job in queue in Hector!

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


382579.sdb njharvey serial_1 xglfb_buil 10935 1 1 — 01:00 R 00:00

comment:17 Changed 8 years ago by njharvey

Hi again Ros,

Do you have any advice when it comes to trying to find what is making the model crash. I know that I need to look in the comp.leave file but it massive!

Thanks,

Natalie

comment:18 Changed 8 years ago by ros

Hi Natalie,

Errors in the compilation file will be somewhere near the bottom. Try searching for words like Error or ERROR.

I'd tried to find your .comp.leave file but it doesn't seem to be in the usual place on HECToR (ie. $HOME/um/umui_out). If you let me know where the output files are I'll take a look.

Regards,
Ros.

comment:19 Changed 8 years ago by njharvey

The .comp.leave file is in $HOME/umui_out

The latest one is xglfb000.xglfb.d11297.t161715.comp.leave

Thanks,

Natalie

comment:20 Changed 8 years ago by ros

Hi Natalie,

From your .comp.leave output it looks like the SCM needs the netCDF module.
I've just made a modification to the UMUI to load the netcdf module for the SCM.

Please close, re-open and then process your job to pick up the change, before resubmitting it to compile again. Hopefully that will fix the problem.

Regards,
Ros.

comment:21 Changed 8 years ago by njharvey

Hi Ros,

That seems to have fixed the nedcdf problem but it still crashed. The latest .comp.leave file is xglfb000.xglfb.d11298.t153942.comp.leave

Thanks,

Natalie

comment:22 Changed 8 years ago by ros

Hi Natalie,

It looks like there is an error in the FORTRAN code, sometimes we find that the compiler on HECToR is a lot less forgiving than the one on the Met Office IBM.

/home/n02/n02/njharvey/SCM/xglfb/ummodel/ppsrc/UM/scm/obs_forcing/s_logic.f90 declares land_sea_mask, land_ice_mask, etc as LOGICALs but is trying to assign an INTEGER to them (imdi - declared in scm_utils.f90) causing the compile to fail:

 LOGICAL, PRIVATE ::                      &
    land_sea_mask (mx_rw_lng*mx_rw) = imdi &! land point flag (any type)
  , land_ice_mask (mx_rw_lng*mx_rw) = imdi &! land point flag (ice)
  , soil_mask     (mx_rw_lng*mx_rw) = imdi &! land point flag (soil)
  , cumulus       (mx_rw_lng*mx_rw) = imdi  ! bl convection flag

I'd suggest that the above variables should be INTEGERs not LOGICALs, but I don't know anything about the SCM code, to know if simply changing these declarations wouldn't have a knock on effect elsewhere. I'd recommend that you talk to the SCM guys at the Met Office (Adrian Lock?).

Hope that helps.

Ros.

comment:23 Changed 8 years ago by ros

  • Owner changed from ros to um_support
  • Status changed from accepted to assigned

Hi Natalie,

I'm re-assigning this ticket to the CMS support team, so that they can help with any further problems as I will be out of the office for a while.

Ros.

comment:24 Changed 8 years ago by njharvey

Hi there,

I have just tried to ssh-add in puma and got this message

njharvey@puma:/home/njharvey> ssh-add
Could not open a connection to your authentication agent.

Do I need to update something becasue of the upgrade?

Thanks,

Natalie

comment:25 Changed 8 years ago by grenville

Natalie

On PUMA please cd to .ssh and delete the file called environment.puma, log out and log in again. That should fix the problem.

Grenville

comment:26 Changed 8 years ago by njharvey

Hello,

I am still battling to get the SCM to run.

Any idea how to overcome the following error?

pathf95-1055 pathf90-4.0.9: ERROR RUN_INIT, File = /home/n02/n02/njharvey/SCM/xglfb/ummodel/ppsrc/UM/scm/initialise/run_init.f90, Line = 73, Column = 7

Module file "/home/n02/n02/njharvey/SCM/xglfb/ummodel/inc/WATER_CONSTANTS_MOD.mod" is incompatible with this compiling system. Recompile the module with this compiling system.

The comp.leave file that corresponds to this can be found /home/n02/n02/njharvey/umui_out/xglfb000.xglfb.d11308.t103247.comp.leave

Thanks,

Natalie

comment:27 Changed 8 years ago by njharvey

Sorry - I have just realised that I hadn't fully updated everything to use phase 3. I think I have done this now but have some different error messages.

The comp.leave file that corresponds to this can be found /home/n02/n02/njharvey/umui_out/
xglfb000.xglfb.d11321.t112216.comp.leave

Any help on this would be great.

Thanks,

Natalie

comment:28 Changed 8 years ago by grenville

Natalie

The upgrade to Phase 3 has not taken place yet. Please see http://cms.ncas.ac.uk/index.php/component/content/1583?task=view. We are working on a fix so that UM users will be able to compile on the current machine and will inform the community when that is in place.

Grenville

comment:29 Changed 8 years ago by ros

Hi Natalie,

In case you didn't see the email that Grenville sent out yesterday regarding the interim solution for compiling while we wait for the HECToR phase 3 upgrade, you can now try recompiling your model.

Full instructions and health warnings are available at: http://cms.ncas.ac.uk/index.php/component/content/1583?task=view

In summary all you should need to do is edit your $HOME/.profile on HECToR to change TARGET_MC=pathscale_quad to be TARGET_MC=cce.

Then in UMUI window "Submodel Independent → User Hand Edit Files" include the following hand edit file

/home/grenville/hand_edits/pathscale-cce-7.8

Cheers,
Ros.

comment:30 Changed 8 years ago by njharvey

Thanks Ros.

I don't seem to have a Submodel Independent option. I have Sub-Model Choices and Coupling or Independent Section Options??

Thanks,

Natalie

comment:31 Changed 8 years ago by ros

The window location varies slightly from version to version at 7.8 it's under "Input/Output? Control and Resources → User hand edit files".

comment:32 Changed 8 years ago by njharvey

Thanks Ros.

I have set that up now and there seems to be less compilation errors that before! However I still get this one:

Start: 2011-11-23 10:40:41⇒ ftn -o sf_exch.o -I/home/n02/n02/njharvey/SCM/xglfb/ummodel/inc -I/home/n02/n02/njharvey/SCM/xglfb/umbase/inc -e m -h noomp -s real64 -s integer64 -hflex_mp=intolerant -g -I /work/n02/n02/hum/gcom/cce/gcom3.8/hector_cce_mpp/inc -c /home/n02/n02/njharvey/SCM/xglfb/ummodel/ppsrc/UM/atmosphere/boundary_layer/sf_exch.f90

IF (l_emcorr_opt == .FALSE.) THEN


ftn-303 crayftn: ERROR ATMOS_PHYSICS2, File = /home/n02/n02/njharvey/SCM/xglfb/ummodel/ppsrc/UM/control/top_level/atmos_physics2.f90, Line = 3953, Column = 25

Data type LOGICAL is not allowed with LOGICAL for the operation "eq".

IF (l_emcorr_opt == .FALSE.) THEN


ftn-303 crayftn: ERROR ATMOS_PHYSICS2, File = /home/n02/n02/njharvey/SCM/xglfb/ummodel/ppsrc/UM/control/top_level/atmos_physics2.f90, Line = 6485, Column = 24

Data type LOGICAL is not allowed with LOGICAL for the operation "eq".

Any ideas what I can do to fix this? The comp.leave file can be found at /home/n02/n02/njharvey/umui_out/xglfb000.xglfb.d11327.t103832.comp.leave

Many thanks,

Natalie

comment:33 Changed 8 years ago by ros

  • Cc grenville added
  • Owner changed from um_support to ros
  • Status changed from assigned to accepted

Hi Natalie,

This looks like a bug in the code and these statements should be

IF ( .NOT. l_emcorr_opt ) THEN

I've fixed these in the branch fcm:um_br/dev/ros/VN7.8_SCM_fixes.

If you add this branch to the table in FCM config → FCM options for atmos and recon and put a Y in the last column you'll pick up the fixes I've just lodged.

Unfortunately we then get back to the problems with the decaration of variables in s_logic.f90 we had before. I tried changing the type of the land_sea_mask, land_ice_mask, etc but that just caused further problems so that isn't the answer. Could you perhaps have a word with Adrian at the Met Office. Meanwhile I'll take another look and see if I can come up with anything else.

Cheers,
Ros.

comment:34 Changed 8 years ago by njharvey

Thanks again Ros.

I have added that branch and that removes those errors.

I have emailed Adrian about the declaration of variables. I will let you know what he suggests.

Cheers,

Natalie

comment:35 Changed 8 years ago by njharvey

Hi Ros,

I have just heard back from Adrian and he said that the problems are being caused by checking whether the SCM namelist has been written consistently. So it ought to be OK initialising these to false and removing the checks on /= imdi.

Can I do this by just going into the file or do I need another branch?

Thanks,

Natalie

comment:36 Changed 8 years ago by ros

Hi Natalie,

Unfortunately you can't just edit the file, the change needs to be in a branch. You're welcome to add the change to the VN7.8_SCM_fixes branch I created if you want otherwise you'll need to create yourself a new branch.

If you decide to modify my branch all you need to do on PUMA is:

# In your home directory check out the code by running the command
puma$ fcm co fcm:um_br/dev/ros/VN7.8_SCM_fixes

# Edit the relevant file(s)

# Commit the change to the repository by running the command
puma$ fcm commit
# You'll be prompted to add a message describing the change
# You'll also probably be prompted for a password just press <return>

# If you left the revision number column blank in the UMUI table where 
# you added the branch, your changes will be picked up the next time 
# you compile.

If you would prefer to create your own branch that's great. The UM Tutorial instructions should enable you to do that. http://puma.nerc.ac.uk/trac/UM_TUTORIAL/wiki/UmTutorial/vn7.1/CodeChanges#Branches
Replacing fcm:um_tutorial with fcm:um and obviously vn7.1 with vn7.8

If you need any help let us know.

Regards,
Ros.

comment:37 Changed 8 years ago by njharvey

Hi Ros,

Thanks for the info.

I had a go at modifying your branch but in the process I appear to have managed to copy some (it was too big to copy all) of um_br/dev/ros/ into my puma directory.

Is it OK to just delete this from puma?

Sorry for being a pain!

Natalie

comment:38 Changed 8 years ago by ros

Hi Natalie,

Yes that's fine to just delete it from your puma directory.

Regards,
Ros.

Changed 8 years ago by njharvey

comment:39 Changed 8 years ago by njharvey

Hi Ros,

I managed to modify your branch once but now when I try and commit further changes I get this error message. Any idea what I am doing wrong?

njharvey@puma:/home/njharvey/VN7.8_SCM_fixes/src/scm/obs_forcing> fcm commit
/home/njharvey/VN7.8_SCM_fixes: working directory changed to top of working copy.
Starting nedit to edit commit message …
Change summary:


[Project: UM]
[Branch : branches/dev/ros/VN7.8_SCM_fixes]
[Sub-dir: <top>]

M src/scm/obs_forcing/s_logic.F90


Commit message is as follows:


removed continuation &


* WARNING: YOU ARE COMMITTING TO A BRANCH NOT OWNED BY YOU.
* Please ensure that you have the owner's permission.

Would you like to commit this change?
Enter "y" or "n" (or just press <return> for "n"): Use of uninitialized value $answer

in scalar chomp at /home/um/fcm/bin/../lib/Fcm/Interactive/InputGetter/CLI.pm line 3

4.
[ABORT] commit: abort by user.
[2] + Done emacs s_logic.F90 &

Also I have had a go at the tutorial to make my own branch as I will need to do the eventually. I can't seem to get it to work. The screenshot attached above shows the settings and the error message I get.

Thanks,

Natalie

comment:40 Changed 8 years ago by ros

Hi Natalie,

Every now and then we get this "use of uninitialized value" and we've not been able to track down its cause at is very intermittent. If you log out of PUMA and back in again and then run the "fcm commit" again it should wait for your response.

The reason the tutorial didn't work is because it needs to be —password "" the is important.

Regards,
Ros.

comment:41 Changed 8 years ago by njharvey

Hi again Ros,

Sorry for the influx of emails today but I think that the model has run! I now have a .leave file not a .comp.leave file but I still don't have any output. I think that this is because it can't write the data to where it wants to put it. There seems to be a in the filepath.

The .leave file can be found here
/home/n02/n02/njharvey/umui_out/xglfb000.xglfb.d11332.t145004.leave

Thanks,

Natalie

comment:42 Changed 8 years ago by ros

Hi Natalie,

Ok, I know what that problem is. In the UMUI window Input/Output? control and resources —> Time convention and SCRIPT environment var you need to change the settings for DATAW and DATAM to be something like $WORKDIR/SCM/$RUNID (This is the location on HECToR where you want your model output data to go).

Submit the job to compile & run again. Once the compile has gone through you should then have an executable in $WORKDIR/SCM/xglfb/bin and the model will hopefully run then.

Regards,
Ros.

comment:43 Changed 8 years ago by njharvey

Hi Ros,

I changed the above as suggested but now have this error message but the file is definitely in that location and seems to open fine in emacs.

qsexecute: %MODEL% output follows:-

Error opening /home/njharvey/gabls2_murkem_tracer_L63_UM78.nml
Checking for namelist.scm in current directory

*
UM ERROR (Model aborting) :
Routine generating error: scm_shell
Error code: 500
Error message:

Error opening namelist.scm

*
gc_abort (Processor 0 ): Error opening namelist.scm

Thanks,

Natalie

comment:44 Changed 8 years ago by ros

Hi Natalie,

You need to make sure all input files and namelists are in the /work disk space on HECToR as the parallel nodes cannot see the /home filespace.

Regards,
Ros.

comment:45 Changed 8 years ago by njharvey

Hi Ros,

I have now corrected that and got a missing file from Adrian at the Met Office. But I now have a memory fault

/work/n02/n02/njharvey/SCM/xglfb/bin/qsexecute: line 1112: 28574: Memory fault(coredump)
rm: missing operand
Try `rm —help' for more information.
xglfb: Run failed

(/home/n02/n02/njharvey/umui_out/xglfb000.xglfb.d11335.t104724.leave)

Any ideas?

Thanks,

Natalie

comment:46 Changed 8 years ago by njharvey

Hi again,

I have resubmitted the job but still get the same error message.

/home/n02/n02/njharvey/umui_out/xglfb000.xglfb.d11335.t114042.leave

Thanks,

Natalie

comment:47 Changed 8 years ago by ros

Hi Natalie,

Having now established it is running the exec correctly, I had a little delve around in the core file that got dumped in your $DATADIR/SCM/xglfb directory (gdb bin/xglfb.exe core) and it appears to have crashed in r2_lw_specin which reads the lw spectral files.

Looking in the UMUI window Atmos → Scientific Sections → section by section choices → LW Radiation, then the Gen2 button at bottom of window, the directory specified is not a HECToR directory (similarly for SW Radation). Hopefully correcting the paths here will get rid of the segmentation fault.

Cheers,
Ros.

comment:48 Changed 8 years ago by njharvey

Thanks for looking into this for me Ros.

I don't have those files so I have requested them from Adrian.

I will let you know how I get on!

Natalie

comment:49 Changed 8 years ago by njharvey

Hello again,

I have the files from Adrian now and have resubmitted the job but still get the same error message. Do you think there are some more files it can't find?

Thanks,

Natalie

comment:50 Changed 8 years ago by njharvey

Hi Ros,

Sorry to pester you about this but do you have any ideas about why this won't run?

Thanks,

Natalie

comment:51 Changed 8 years ago by ros

Hi Natalie,

Sorry for the delay. Grenville and I were out of the office on Friday, unfortunately. We are, however, both looking at this problem today. It looks like it's crashing on the actual call to r2_lw_specin and not even getting into the routine, but currently we can't see why, all the arguments look to contain sensible values.

We'll update you as soon as we have any further news.

Regards,
Ros.

comment:52 Changed 8 years ago by njharvey

Hi,

I have been running the SCM locally for a while now and have successfully put in a tracer source and decay rate. I am now trying to run it for several days (to get my source/decay rate to equilibrium) but as I extend the length of the run I get a "memory fault".

Any ideas what this might be and how to remove it?

Thanks,

Natalie

comment:53 Changed 8 years ago by grenville

Natalie

Could you point us to the leave file fo this run?

Grenville

comment:54 Changed 8 years ago by njharvey

Hi Grenville,

I can't seem to find a leave file….
The executable I am using is here: /home/mr028229/scm/xglfi/ummodel/bin/xglfi.exe
The run I am using is /home/mr028229/umui_runs/xglfi-236112157.

Thanks,

Natalie

comment:55 Changed 8 years ago by grenville

Natalie

Please run the model again and after typing the executabe name add:

results 2>&1

Then let me know where results is.

Thanks

Grenville

comment:56 Changed 8 years ago by njharvey

Hi Grenville,

Sorry for the delay in doing this, I was working from home yesterday.

The results file is in /home/mr028229/umui_runs/xglfi-236112157/ but the memory fault error message was displayed on the command line.

Natalie

comment:57 Changed 8 years ago by grenville

Natalie

Not much help there. Before running the job type:

ulimit -s unlimited

this might increase the stack size which if too small can lead to memory problems.

Grenville

comment:58 Changed 7 years ago by ros

  • Component changed from UMUI to UM Model
  • Platform set to Other
  • Resolution set to fixed
  • Status changed from accepted to closed

No further response has been received on this ticket for several months so it is now being closed. If the problem still exists please raise a new ticket or reopen this one.

Note: See TracTickets for help on using tickets.