Opened 7 years ago

Closed 7 years ago

#1096 closed help (fixed)

HadGEM2 not resubmitting at qsserver step

Reported by: swr04ojb Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: MONSooN
UM Version: 6.6.3

Description

I'm trying to perform a HadGEM2 simulation (jobid is xinha), and it finishes the first month okay but then fails to resubmit. The error message in the leave files is:

Use of uninitialized value $JOBDIR in concatenation (.) or string at /projects/um1/hg6.6.3/ibm/scripts/qsmoose line 135.
cat: cannot open [YOU
cat: cannot open HAVE
cat: cannot open NEW
cat: cannot open MAIL]
=========================================================
xinha: qsserver failure at Mon Jul 8 14:15:52 GMT 2013
=========================================================
/projects/um1/hg6.6.3/ibm/scripts/qsexecute[825]: assign: not found

/projects/um1/hg6.6.3/ibm/scripts/qspickup: Normal completion
/projects/um1/hg6.6.3/ibm/scripts/qshistprint: Job terminated normally
/projects/um1/hg6.6.3/ibm/scripts/qsresubmit: Error job not resubmitted because of server failure

and down in the main body, you can see that it fails when it tries to archive a dump..

qsserver: Mon Jul 8 14:15:49 GMT 2013: xinhaa.dax6110 ARCHIVE DUMP
qsmoose: arguments passed are:

xinhaa.dax6110 olbrow /home/olbrow 6.6.3 NRUN /home/olbrow/umui_runs/xinha-189143113

The command to create a set is:

"moo mkset -v moose:crum/xinha"

system error, return code = 2
mkset command-id=108586763 failed: (SSC_TASK_REJECTION) one or more tasks are rejected.

moose:/crum/xinha: (TSSC_PROJECT_NAME_REQUIRED) A project name must be specified.

mkset: failed (2)

qscasedisp: return code after calling qsmoose RCARC=2

On MONSooN my username is olbrow. I have attached a .leave file in case that helps.

Change History (11)

comment:1 Changed 7 years ago by swr04ojb

Correction, it appears I can't upload a file of more than 0.5mb, and the leave file is 12mb. If you want me to send the file through, do say. However the directory on MONSooN should be world-viewable..

/home/olbrow/output/xinha000.xinha.d13189.t133126.leave

comment:2 Changed 7 years ago by ros

Hi Oliver,

Can you please go to UMUI window submodel-independent → Compilations and modifications → UM Scripts build and switch on Enable build of UM scripts. This will force a rebuild of the UM scripts - you are picking up one of the central scripts which I don't believe has a recent modification that you require. Save, Process and resubmit and I'm hoping that will fix your problem.

Cheers,
Ros.

comment:3 Changed 7 years ago by swr04ojb

Hi Ros,

thanks for the input. Just one thing before I do that — will that have the effect of rebuilding the executable? This job is to complete a rather long run (circa 650years) that someone else was managing but has now moved on. We need to run the job forward for about another 300 yrs, and I think it is probably important to keep the executable the same, if we can.

kind regards,

oliver

comment:4 Changed 7 years ago by ros

Hi Oliver,

No it won't rebuild the model executable. All it will do is extract the UM scripts to your $DATAW/bin directory.

Cheers,
Ros.

comment:5 Changed 7 years ago by swr04ojb

Hi Ros,

I've done that but unfortunately the run now fails in the .comp step. Saying..

Currently Loaded Modulefiles:

1) xlf/v13.1.0.6 2) xlcpp/v11.1.0.6

[FAIL] /projects/lastmil/olbrow/xinha/umbase/cfg/bld.cfg: cannot locate config file, abort at /projects/um1/fcm/bin/../lib/Fcm/ConfigSystem.pm line 539

Build command started on Tue Jul 9 15:07:54 2013.
→Parse configuration: start
Base build: failed

Any suggestions?

cheers,
oliver

comment:6 Changed 7 years ago by ros

Hi Oliver,

Hmmm I would have expected it to create that directory, but it doesn't actually match you current setup so maybe that's why.

In UMUI window submodel indep → FCM Configuration → FCM Config variables try changing Target m/c root extract directory (UM_ROUTDIR) to be /projects/lastmil/$USER/um.

Cheers,
Ros.

comment:7 Changed 7 years ago by swr04ojb

Hi Ros,

that doesn't seem to have made any difference, I get what appears to be the same error message. And when I've checked, the directory only exists as far as

/projects/lastmil/olbrow/um/xinha

I've tried creating the remaining directories by hand ( umbase/cfg ), just in case that would help, but it didn't :(

cheers,
oliver.

comment:8 Changed 7 years ago by ros

Hi Oliver,

Found the problem. Using the variable $USER won't work as your userid on PUMA & MONSooN are different. The extract is failing as it is trying to write to /projects/lasmil/swr04ojb/um.

So in UMUI window submodel indep → FCM Configuration → FCM Config variables change Target m/c root extract directory (UM_ROUTDIR) to be /projects/lastmil/ojbrow/um

Cheers,
Ros.

comment:9 Changed 7 years ago by swr04ojb

Hi Ros,

thanks, that has got through the compilation step and is running the first month. I'll update again as to whether it gets through the archive step and continues on properly.

I have since, separately, noticed a second problem, in the output we have so far, which I'll list as a new ticket.

cheers,

Oliver

comment:10 Changed 7 years ago by swr04ojb

This completed the first month fine, with the usual "completed by not resubmitting because this is an NRUN" type message. I think it's safe to say you can close this ticket now. Thanks for the help!

comment:11 Changed 7 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.