Opened 4 years ago

Closed 3 years ago

#1789 closed help (fixed)

UKCA release job reaching esceeding job time limit after changing STASH codes

Reported by: dilshadshawki Owned by: annette
Component: UM Model Keywords: UKCA, UMUI, time, archiving
Cc: Platform: MONSooN
UM Version: 8.4

Description

Dear Helpdesk,

The job xlzyj exceeds the job time limit with very little information in the .leave file:

/home/ds5912/ouput/xlzyj000.xlzyj.d16013.t174138.leave

I simply made a copy of a job that I know works: xlytb and then removed the STASH diagnostics and added my own ones in, but it only managed to output one monthly file:

/projects/ukca-imp/dshawk/xlzyj/xlzyja.pm1999dec

and then it crashed.

There is no pe_output folder, where in previous tickets there are errors to be found!

Can anyone help me with this issue?

Many thanks,
Dill

Change History (17)

comment:1 Changed 4 years ago by dilshadshawki

Sorry, above the first directory should say:

/home/dshawk/output/xlzyj000.xlzyj.d16013.t174138.leave

comment:2 Changed 3 years ago by annette

  • Owner changed from um_support to annette
  • Status changed from new to assigned

Hi Dill,

I think the reason that the pe_output files are not being written is that the variable $DATAW is set incorrectly in your job.

In the UMUI go to Input/output control and resources → Time convention and SCRIPT environment variables

It looks like a typo with "dhsawk" instead of "dshawk".

Annette

comment:3 Changed 3 years ago by annette

  • Status changed from assigned to pending

comment:4 Changed 3 years ago by dilshadshawki

Hi Annette,

Thanks for pointing this out. I tried running it again after correcting the typo, but I still get the same issue where it exceeds the time limit:

/home/dshawk/output/xlzyj000.xlzyj.d16018.t172623.leave

But this time I do get the pe_output folder.

Although I must admit, the penny hasn't dropped as to which file I should search in to find an informative error message!

/projects/ukca-imp/dshawk/xlzyj/pe_output

Please help!

Many thanks,
Dill

comment:6 Changed 3 years ago by annette

  • Status changed from pending to assigned

Dill,

From the pe_output files it looks like the model has finished running, but the scripts aren't finishing. This looks the same as ticket:1748#comment:8

I'm not sure what is causing the problem here, but will look into it.

Annette

comment:7 Changed 3 years ago by dilshadshawki

Hi Annette,

Many thanks for looking into this. I look forward to hearing from you.

Dill

comment:8 Changed 3 years ago by annette

Hi Dill,

I think the problem is that you have switched on post-processing but are missing the required branch:

fcm:um-br/dev/ros/vn8.4_MetoCray_arch/src

See here for full instructions:

http://collab.metoffice.gov.uk/twiki/bin/view/Support/CrayUMInstall#Archiving

If you have compilation switched off, you can re-build the scripts by selecting the option "Enable build UM scripts" in Compilation and Run options → UM Scripts Build.

I have tested this on Sunil's job (from ticket #1748 referenced above) which I think shows the same error.

Do let me know how you get on.

Annette

comment:9 Changed 3 years ago by annette

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:10 Changed 3 years ago by dilshadshawki

Hi annette,

Apologies, shortly after our last message I went abroad. I've added the branch as you mentioned and I will let you know how I get on.

Cheers,
Dill

comment:11 Changed 3 years ago by dilshadshawki

Hi Annette,

I made the changes as you suggested, however I get an error in the dump reconfiguration

/home/dshawk/output/xlzyj000.xlzyj.d16036.t120303.rcf.leave

Any other ideas?

Cheers,
Dill


comment:12 Changed 3 years ago by annette

Hi Dill,

As you have probably seen there's nothing really helpful in the leave file about why it failed. And it looks as though since this failure (dated 5 Feb), you have successfully run the recon for this job (on 8 Feb). So it would be helpful if you can explain what you are trying to do and what you want help with?

From the 8 Feb run, it looks like the model recon and run have completed OK, but the archiving has failed as it seems to have run /before/ the model run. I can look into this if you like?

Best regards,
Annette

comment:13 Changed 3 years ago by dilshadshawki

Hi Annette,

You are correct that I did run it again on 8th Feb and I am amazed that I forgot I had done this, yet still sent off this ticket to you. Please forgive me, I will get my head screwed back on!

Yes please could you look into the why it didn't archive since I have set it up to archive to MASS via MOOSE on the UM setup, Post-Processing → Main Switch + General Questions and the previous issue you helped me with was to do with the archiving.

Could I do start the CRUN in the meantime?

Cheers,
Dill

comment:14 Changed 3 years ago by annette

  • Resolution fixed deleted
  • Status changed from closed to reopened

Hi Dill,

I took a copy of your job, and the archiving seems to work OK for me. However I have noticed that my scripts differ from the versions you are using, so I think you may need to re-build the scripts (see #comment:8).

Let me know if this works.

Annette

comment:15 Changed 3 years ago by annette

  • Status changed from reopened to pending

comment:16 Changed 3 years ago by dilshadshawki

Hi Annette,

It seems that the archiving is working after all! I have just checked MASS/MOOSE and the files are being archived successfully!

Many thanks for your help on this!

Let's get this ticket closed shall we??

Best wishes,
Dill

comment:17 Changed 3 years ago by annette

Great glad it's working.

Annette

comment:18 Changed 3 years ago by annette

  • Keywords time, archiving added; time removed
  • Resolution set to fixed
  • Status changed from pending to closed

Great glad it's working.

Annette

Note: See TracTickets for help on using tickets.