Opened 4 years ago
Closed 4 years ago
#2004 closed help (fixed)
Reconfiguration Randomly Failing
Reported by: | s1374103 | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | reconfiguration |
Cc: | Platform: | MONSooN | |
UM Version: | 8.4 |
Description
Dear Helpdesk,
The job I am developing is behaving quite erratically. Sometimes it runs perfectly, sometimes it fails at reconfiguration.
[NID 00105] 2016-10-21 10:17:29 Exec /projects/ukca-ed/jakel/xmyvw/bin/qxreconf failed: chdir /work/scratch/jtmp/pbs.1067712.xcm00.x8z No such file or directory /projects/ukca-ed/jakel/xmyvw/bin/qsrecon: Error in dump reconfiguration - see OUTPUT
If it fails at reconfiguration I re-submit the job and after a couple of attempts it usually runs. However, sometimes it does not run. In that case I copy to a new job and repeat the process. Should I be worried about this?
Base job - xlsjc - vn8.4 RJ4.0 CheST+GLOMAP-mode (release version)
my job - xmyvv - nudging, changes to chemistry scheme/emissions, GLOMAP-mode MS4 confiuration.
Regards,
Jamie
Change History (6)
comment:1 in reply to: ↑ description Changed 4 years ago by s1374103
- Platform set to MONSooN
comment:2 Changed 4 years ago by willie
Hi Jamie,
In fact, it consistently fails with a message like,
Exec /projects/umadmin/wmcgin/xmyze/bin/qxreconf failed: chdir /work/scratch/jtmp/pbs.107478 8.xcm00.x8z No such file or directory
However, it does create a umui_submit_rcf script in the ~/umui_runs/xmyvv… directory which does work - you just qsub this. So there is no need for job copying.
We think this is a problem with MONSooN and we're still investigating.
Regards,
Willie
comment:3 Changed 4 years ago by willie
Hi Jamie,
If you switch off "Use different version of the UM code …" in the FCM options page it should work straight through.
Regards
Willie
comment:4 Changed 4 years ago by s1374103
Hi Willie,
When you say it create a umui_submit_rcf script and that I should submit, do I just type 'qsub ~umui_runs/xmyvv' in Puma at the command line? Is this the equivalent of sibmitting through the UMUI?
Also, for a little while after you replied to my message the model was submitting fine. Now, when I submit through the UMUI I am getting the following message
Initialising SUBMIT... Writing remote commands file... Calling MAIN_SCR - local... (This may take several minutes.) MAIN_SCR: Calling Extract ... Extracting UMATMOS base repository... UMATMOS base repository extract is OK Extracting JULES base repository... JULES base repository extract is OK created umscripts sub-directory. Extracting UMSCRIPTS including any branches... UMSCRIPTS extract is OK created umatmos sub-directory. Extracting UMATMOS including any branches... UMATMOS extract is OK created umrecon sub-directory. Extracting UMRECON including any branches... UMRECON extract is OK MAIN_SCR: Extract OK MAIN_SCR: Submit OK Logging in to remote machines lander.monsoon-metoffice.co.uk and xcml00... key_read: uudecode b2:06:df:3e:f5:e4:c9:5d:4d:1f:17:4d:89:1c:90:72 AAAAB3NzaC1yc2EAAAABIwAAAIEArb08RIqZgsa02Lj9pGCxwOOZ2NRRQrKKL/foZF47IkDtgepcyNIy9H4YJkry+grlGoimoMf6qab/ToRpXfzrcTqdI8yygOLxPctI8moOGI5SO4yq+LQ94fk8MlHe69sdmBNdCoIrlRcZo9BJlOr91ibqKR+NlyVC72l+QryJ7Zk= failed key_read: uudecode b2:06:df:3e:f5:e4:c9:5d:4d:1f:17:4d:89:1c:90:72 AAAAB3NzaC1yc2EAAAABIwAAAIEArb08RIqZgsa02Lj9pGCxwOOZ2NRRQrKKL/foZF47IkDtgepcyNIy9H4YJkry+grlGoimoMf6qab/ToRpXfzrcTqdI8yygOLxPctI8moOGI5SO4yq+LQ94fk8MlHe69sdmBNdCoIrlRcZo9BJlOr91ibqKR+NlyVC72l+QryJ7Zk= failed key_read: uudecode b2:06:df:3e:f5:e4:c9:5d:4d:1f:17:4d:89:1c:90:72 AAAAB3NzaC1yc2EAAAABIwAAAIEArb08RIqZgsa02Lj9pGCxwOOZ2NRRQrKKL/foZF47IkDtgepcyNIy9H4YJkry+grlGoimoMf6qab/ToRpXfzrcTqdI8yygOLxPctI8moOGI5SO4yq+LQ94fk8MlHe69sdmBNdCoIrlRcZo9BJlOr91ibqKR+NlyVC72l+QryJ7Zk= failed REMCOMMS 100% 9567 9.3KB/s 00:00 Creating directory... Copying job files... Renaming SUBMIT... Changing SUBMIT permissions... Running SUBMIT script... Your job directory on host xcml00 is: /home/jakel/umui_runs/xmzmd-307095248 /home/jakel/umui_runs/xmzmd-307095248/SUBMIT[28]: .: /home/jakel/.profile: cannot open [No such file or directory] Copying files to directory /projects/ukca-ed/jakel/xmzmd/baserepos/UMATMOS using rsync... See /projects/ukca-ed/jakel/xmzmd/baserepos/UMATMOS/ext.out for output Copying files to directory /projects/ukca-ed/jakel/xmzmd/baserepos/JULES using rsync... See /projects/ukca-ed/jakel/xmzmd/baserepos/JULES/ext.out for output Copying files to directory /projects/ukca-ed/jakel/xmzmd/umscripts using rsync... See /projects/ukca-ed/jakel/xmzmd/umscripts/ext.out for output Copying files to directory /projects/ukca-ed/jakel/xmzmd/umatmos using rsync... See /projects/ukca-ed/jakel/xmzmd/umatmos/ext.out for output Copying files to directory /projects/ukca-ed/jakel/xmzmd/umrecon using rsync... See /projects/ukca-ed/jakel/xmzmd/umrecon/ext.out for output Connection to xcml00 closed. Connection to lander.monsoon-metoffice.co.uk closed. Tidying local directories... Job submission completed
Do you understand this?
Regards,
Jamie
comment:5 Changed 4 years ago by willie
Hi Jamie,
You don't need to qsub manually if you switch off "Use different version of the UM code …" in the FCM options page.
Is xmzmd actually failing? I couldn't find the leave file.
Willie
comment:6 Changed 4 years ago by willie
- Resolution set to fixed
- Status changed from new to closed
Replying to s1374103: