#1686 closed help (fixed)

puma pemrission denied during submit

Reported by: s1374103 Owned by: ros
Priority: normal Component: PUMA
Keywords: puma permission denied Cc:
Platform: MONSooN UM Version: 8.4

Description

Dear CMS,

I have copied a job (xlsjc) which has successfully be copied over to Cray (http://www.ukca.ac.uk/wiki/index.php/MONSooN_IBM_to_Cray_Transition). When I submit the job I type in my monsoon passcode fine but when I type in my puma password it says;

ERROR: puma.nerc.ac.uk: Permission denied while attempting to access account jakel on host  xcml00. Note that repeated failures may result in  expiry of password due to security procedures on some  machines. Check user id, hostname and password  for your account on the host machine.

I have previously been using Archer and this is my first job on monsoon. Have I set up monsoon incorrectly?

Regards,

Jamie

Change History (12)

comment:1 Changed 19 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Jamie,

Have you set up your ssh-agent for MONSooN as per the instructions here: http://cms.ncas.ac.uk/wiki/MonsoonSshAgent ?

Regards,
Ros.

comment:2 Changed 19 months ago by s1374103

Hi,

I hadn't followed the instructions correctly and I'd placed the config file in the wrong location. The job has submitted now.

Thanks,

Jamie

comment:3 Changed 19 months ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

Hi Jamie,

Glad you have it working now.

Cheers,
Ros.

comment:4 Changed 19 months ago by s1374103

Hi Ros,

My jobs are failing to submit today and giving me the same error as before. Any suggestions as to why it was working yesterday but not today?

Regards,

Jamie

comment:5 Changed 19 months ago by ros

Hi Jamie,

Is your ssh-key still attached to your ssh-agent? Try running ssh-add.

If it prompts for your passphrase then your agent has probably restarted when you logged in this morning. The ssh-agent may not persist between and PUMA sessions and you may need to re-run ssh-add.

Cheers,
Ros.

comment:6 Changed 19 months ago by s1374103

Hi Ros,

Sorry about that, I was submitting the wrong job. Everything is fine.

Regards,

Jamie

comment:7 Changed 19 months ago by s1374103

Hi,

I am trying to run this job for 1 year using monthly automatic resubmission but have failed.

job id - xlxcr

In the model selection → compile and run options → compile and run options for atmosphere and reconfiguration window, how is this supposed to be set up for an nrun and what needs to be changed when the crun is submitted?

I thought that this is set up to do a nrun, however, th ejob failed and the .leave file contained

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: io:buffin
? Error Code:    24
? Error Message: Error in buffin errorCode=3.00 len=       256/       256
? Error generated from processor:     0
? This run generated   4 warnings
????????????????????????????????????????????????????????????????????????????????

Rank 0 [Mon Oct 19 11:36:46 2015] [c0-0c1s6n3] application called MPI_Abort(MPI_COMM_WORLD, 9) - p
rocess 0
Application 181893 is crashing. ATP analysis proceeding...

ATP Stack walkback for Rank 0 starting:
  _start@start.S:113
  __libc_start_main@libc-start.c:242
  flumemain_@flumeMain.f90:48
  um_shell_@um_shell.f90:1668
  readsize_@readsize.f90:402


I thought that how this job is currently set up should provide an nrun, and that by switching off the radio button for "compile model executable" would allow the crun.

Regards,

Jamie

comment:8 Changed 19 months ago by ros

Hi Jamie,

Yes your job is indeed set up to NRUN and it has submitted the NRUN, but you have a BUFFIN error, which if you look further down the .leave file you will see that it has failed to find your start dump /projects/ukca-ed/jakel/xlxcr/xlxcr.astart. The way you currently have your job configured you will need to run the reconfiguration to generate the above start dump.

In panel model selection → compile and run options → compile and run options for atmosphere and reconfiguration you can now switch off the compilation of model and reconfiguration executables, and switch on Run the reconfiguration.

Submit the job and it should then run the reconfiguration to generate your new start dump and then submit the NRUN.

Once that is complete, you then need to switch off the reconfiguration and select CRUN for the model run. Then set the "job step run length" and "time limit" in the Input/Output Control & Resources → Resubmission pattern window, which it looks like you have already done.

Cheers,
Ros.

comment:9 Changed 19 months ago by ros

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:10 Changed 19 months ago by s1374103

Hi Ros,

Thanks for that, I'll give it a go.

What is reconfiguration?

Regards,

Jamie

comment:11 Changed 19 months ago by ros

Hi Jamie,

The reconfiguration is a standalone program which modifies (“reconfigures”) the UM atmosphere start file to produce a new start file. You can use the reconfiguration to, for example:

  • Upgrade a start file from an earlier code version
  • Add fields including user-defined fields
  • Overwrite existing fields
  • Change resolution

Regards,
Ros.

comment:12 Changed 18 months ago by ros

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.