Opened 2 years ago

Closed 23 months ago

#2274 closed help (fixed)

Rose Stem tests fail on MONSooN

Reported by: mhollaway Owned by: ros
Component: Monsoon Keywords: rose-stem, ukca, MONSooN
Cc: Platform: Monsoon2
UM Version: 10.8

Description

Hi,

I am currently trying to run the rose-stem tests at vn10.8 on MONSooN. Some of the tests run fine (E.g. The SCRIPTS set).

However the fcm_make_meto_xc40_install_ctldata currently keeps crashing with the following error:

[FAIL] bash -ec H=$(rose\ host-select\ xcsc);\ echo\ $H # return-code=1, stderr=
[FAIL] [WARN] xcslc1: (ssh failed)
[FAIL] [WARN] xcslc0: (ssh failed)
[FAIL] [FAIL] No hosts selected.
Received signal EXIT
2017-09-15T12:20:26Z CRITICAL - Task job script received signal EXIT
2017-09-15T12:20:26Z CRITICAL - failed

I cannot seem to find the source of this error as all of the other fcm_make tests in the rose-stem suite seem to run fine with no issues. This issue of course prevents the rest of the rose stem apps from running.

Are there any changes that I may have missed in order to run the rose-stem suites on MONSooN?

Best Regards,

Michael.

Change History (9)

comment:1 Changed 2 years ago by ros

  • Status changed from new to pending

This query has been sent to the Monsoon team.

I will update here for reference to others once I've seen their response.

Last edited 2 years ago by ros (previous) (diff)

comment:2 Changed 2 years ago by ros

Hi Michael,

Can you try running rose host-select xcsc on the command line on exvmsrose please?

I've just trying running rose stem --group=xc40_developer and the fcm_make_meto_xc40_install_ctldata task has submitted and run fine on xcslc1

Cheers,
Ros.

comment:3 Changed 2 years ago by mhollaway

Hi Ros,

If I run rose host-select xcsc at the command line on exvmsrose I get one of the following responses: xcslc0 or xcslc1.

I have tried ssh into both from exvmsrose and can log in fine. Somebody mentioned at the Met Office that the fcm_make_meto_xc40_install_ctldata task tries to ssh to localhost. When I try this I get prompted for a password. I dont know if this could be causing any underlying issues.

If I recall when we switched to the new XCS machine I did initially have issues with ssh between the machines on MONSooN so I dont know if my problems could be a hangover from this?

Cheers

Michael.

comment:4 Changed 2 years ago by ros

Hi Michael,

ssh localhost was going to be my next question as I have heard this too. You'll need to create a passwordless ssh-key and add the public key to your ~/.ssh/authorized_keys files.

If you need help doing this let us know.

Cheers,
Ros.

comment:5 Changed 2 years ago by mhollaway

Hi Ros,

Would it be possible for you to advise on how to set up the passwordless ssh-key please? I did do it a long time ago between Puma and old MONSooN (is the procedure the same?) but it has been a while so I am guessing the procedure may have changed a little?

Do I need to know a password (other than my standard MONSooN passcode) to do this or is ssh localhost just asking for one because the ssh-key is not in place.

Cheers,

Michael.

comment:6 Changed 2 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from pending to assigned

Hi Michael,

You don't need to know any passwords to do this as it is all within the Monsoon domain.

  1. ssh-keygen -C "<username>@monsoon"
  1. Press return when prompted for passphrase.
  1. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

You should now be able to ssh localhost.

Cheers,
Ros.

comment:7 Changed 2 years ago by mhollaway

Hi Ros,

Thanks for the info. I have followed the instructions and everything seems to work fine. I am now able to ssh to localhost.

This works on both the xcs and exvmsrose machines. I have yet to try running the rose stem tests but now that I am able to move around the system without a password they should hopefully work ok.

Many thanks again.

Cheers

Michael.

comment:8 Changed 2 years ago by mhollaway

Hi Ros,

I have now had a chance to run the rose-stem tests and they now run without issue. I have tried both the xc40_developer and the xc40_ukca groups and all tests complete and pass.

Thanks again for all you help on this, I think this ticket can now be closed.

Cheers

Michael.

comment:9 Changed 23 months ago by ros

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.