Opened 4 years ago

Closed 4 years ago

#1540 closed help (answered)

ensemble jobs, qsserver error?

Reported by: ggxmy Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 7.3

Description

Dear Grenville,

My jobs failed in a bit mysterious way and I thought it was due to the disk quota (I had exactly 500 GB in /nerc). But now I think it was actually the qsserver failure. A couple of months ago I had this problem and you helped me through CMS. Your answer was to turn off automatic post processing. This cured the problem, but I feel it's strange…

My job tdwnu ran for 11 months and 28 days and died. When I resubmit the job, it failed due to qsserver failure. Isn't this strange? Is this the nature of qsserver failure?

My job tdwra ran for 2 moths without a problem. My first perturbed parameter ensemble members derived from it, tdwrb-d, ran for 2 months but two of them failed to create the monthly mean .pp data for the final month. .leave file has been overwritten and so I can no longer check it. The remaining PPE members tdwre-z and tdwsa-e don't seem to have run at all. All seem to have qsserver failure.

Do you still think the solution would be turning off automatic post processing? Of course I can try doing it and see what happens, but I will have to wait for hours or maybe until tomorrow before I see the results.

Also, what happens if I turn off automatic post processing? The results will not be archived as .pp files? Then is there a script to convert a large number of outputs to .pp files?

Thanks.
Best regards,

Dr. Masaru Yoshioka

Change History (2)

comment:1 Changed 4 years ago by ros

  • Reporter changed from Masaru Yoshioka [M.Yoshioka@… to ggxmy

comment:2 Changed 4 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed

This partially addressed in external emails - the qsserver issue is under review.

Grenville

Note: See TracTickets for help on using tickets.