Opened 4 years ago

Closed 4 years ago

#1669 closed help (answered)

Error submitting jobs on ARCHER

Reported by: webber24 Owned by: um_support
Component: ARCHER Keywords: xancil
Cc: Platform: ARCHER
UM Version: 8.4

Description

Dear CMS,

When I attempt to submit a job using the UMUI I get the following error:

ERROR: Timed out, login.archer.ac.uk not responding while attempting to access account webber24 on host login.archer.ac.uk. Note that repeated failures may result in expiry of password due to security procedures on some machines. Check user id, hostname and password for your account on the host machine.

Would you have any idea how to resolve the issue?

Chris

Change History (13)

comment:1 Changed 4 years ago by annette

Hi Chris,

We seem to be having some connection issues between puma and Archer. This is being investigated by Andy and the Archer team.

Annette

comment:2 Changed 4 years ago by annette

Just to add that it does seem to be intermittent, so leave it a short while then try again and you may find that it works.

Annette

comment:3 Changed 4 years ago by ros

  • Status changed from new to pending

The JANET engineers have identified a network issue and put a fix in place. Please let us know if you see any further network stalling between PUMA and ARCHER.

Regards,
Ros.

comment:4 Changed 4 years ago by webber24

Dear Ros,

Having an issue, which I believe is related to the network issues between ARCHER and Reading, although it has persisted following the resolution of the connection issues. My issue regards XANCIL and the subsequent formulation of ancillary files. XANCIL will open, it will also load my .job file, but it will not create ancillaries. When you select the option to make ancillaries, XANCIL freezes and eventually crashes. This issue was not occurring before the connection issues, but I am unsure as to what could be causing it.

All the best,

Chris

comment:5 Changed 4 years ago by webber24

Dear CMS,

Just wondered whether there had been any progress with this issue?

All the best,

Chris

comment:6 Changed 4 years ago by willie

Hi Chris,

Could you let us know the name and location of the job file and describe any error messages that occur.

Regards

Willie

comment:7 Changed 4 years ago by webber24

Dear Willie,

The .job file is the only .job file in the directory:

/work/n02/n02/webber24/Start_Files_Clim

There are no error messages, but when I click: Create Anc. files, Xancil freezes.

All the best,

Chris

comment:8 Changed 4 years ago by willie

  • Keywords xancil added

Hi Chris,

Does this need a land-sea mask? It "freezes" for me too, but this could be due to the compute demand. You should check the setup of the job carefully. The network issues have been resolved so this is an xancil/driving data problem. Has a previous job worked?

If the job is going to take a long time, you'll need to launch it in the serial queue, using

  xancil -j SST_model.job -x

Regards,

Willie

comment:9 Changed 4 years ago by webber24

Hi Willie,

I shouldn't need the land sea mask, but even without this, xancil still freezes. I will submit the job to the serial queue and see if it runs this way.

Thanks for your help,

Chris

comment:10 Changed 4 years ago by webber24

Hi Willie,

Tried to submit the xancil job that way and it just timed out, any ideas why this is so slow now? It ran perfectly fine before the connection issues.

All the best,

Chris

comment:11 Changed 4 years ago by willie

Hi Chris,

Which ancils are you trying to produce? You need to switch them on in SST_Model.job. Make sure that the netcdf file names agree with the ones in Xancil.

Regards

Willie

comment:12 Changed 4 years ago by jeff

Hi Chris

It looks like you have managed to find a bug in xancil, this has nothing to do with the archer problems and you must have been doing something different when it worked previously. I will fix the problem with xancil but in the mean time you should be able to get it to work by making sure the SST and ice netcdf files have the same number of time values.

Jeff.

comment:13 Changed 4 years ago by jeff

  • Resolution set to answered
  • Status changed from pending to closed
Note: See TracTickets for help on using tickets.