Opened 4 years ago

Closed 4 years ago

#1697 closed help (fixed)

Can't get any UKCA job to run on ARCHER..... again

Reported by: jsabrooke Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 8.4


Hi, really sorry to be causing trouble again (it's the third ticket I've made this year about not being able to run a tutorial job).

I've tried testing with both the UM FCM and UKCA tutorial jobs, and in both cases, as soon as the job starts to run, it fails. The output files in pe_output/ are created, but all have no content.

My last problem was solved by starting from scratch at ​, and redoing all the ssh stuff. I've done all this again, but it hasn't helped this time.

The first problem I had was caused by a hand edit trying to use a different queue, but this is turned off.

Normally, the build and reconfiguration seem ok, although when I last ran it on Friday, the reconfiguration seemed to fail because it ran out of time….

When it does get the the run stage, it fails and the .leave file shows a segmentation fault.

Please let me know if you know what the problem is. Thanks very much for your help, and sorry to bother you again,

Change History (5)

comment:1 Changed 4 years ago by grenville


When you say "Normally, the build and reconfiguration seem ok," - this seems to imply that you have run successfully at some point - is this the case?


comment:2 Changed 4 years ago by jsabrooke

Hi Grenville,

sorry I wasn't that clear. What I meant was that initially, the build worked, the reconfiguration worked, and then the actual run failed immediately and produced empty output files.

I then had other issues and the build and reconfiguration started failing too, but I've fixed that, and I'm back to the issue above.

Sorry I didn't notice this before, but the .leave file says:

I looked for possible relevant settings in the umui and found:
Use OpenMP, and changed number of threads to 1, and also the two threading options in the COMMs menu. I changed them both to serialzied. I'm guessing that this multithreading refers to using one or two threads per core, and not the use of multiple cores? Anyway, these changes produced the same result, but removed the warning message, so I assume that's not the issue…

The last two .leave files at /home/n02/n02/jsab500/output/ are:
xldlc000.xldlc.d15293.t104114.leave (with the threading changes mentioned above)
xldlc000.xldlc.d15293.t110301.leave (another attempt back with the default theading settings)

Thanks a lot,

comment:3 Changed 4 years ago by grenville


The prebuild is causing the problem - go to FCM Config..→FCM Options for Atm... and uncheck "Use prebuild" — then do a full build and run. I did this in a copy of your job (xlvkz) and it worked OK.


comment:4 Changed 4 years ago by jsabrooke

Hi Grenville,

thanks a lot, that worked. Just so you know, the same occurred for the UM FCM tutorial job.


comment:5 Changed 4 years ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed


Thanks - prebuilds are supposed to increase efficiency (they do for running jobs in a UM course). Probably best to check in any jobs you copy in future and switch them off. The problem arises when the prebuild is created with a different version of the compiler from the one you're using — there's no simple way of knowing what was used for the prebuild.


Note: See TracTickets for help on using tickets.