Custom Query (3327 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (55 - 57 of 3327)

Ticket Resolution Summary Owner Reporter
#1719 fixed Error with CRUN, NEMO restart dump no consistent with the UM .xhist file annette dilshadshawki
Description

Hello Helpdesk,

I have been trying to run a job xlzcg and I have managed to run an NRUN with the output being created successfully. But then when I attempt to run a CRUN I get the following message in the .leave file:

/home/dshawk/otuput/xlzcg000.xlzcg.d15308.t145403.leave
ERROR: The latest NEMO restart dump does not seem to be
       consistent with the UM .xhist file
       This suggests an untidy model failure but you
       may be able to retrieve the run by copying the
       backup dump and xhist files to original location
       and restarting with the appropriate NEMO dump

There is also a .comp.leave which shouldn't have been produced since with the CRUN I am not compiling anymore?

/home/dshawk/output/xlzcg000.xlzcg.d15308.t145403.comp.leave

Earlier, I tried to run this job using the restart dumps produced by the run in the projects folder: /projects/ukca-imp/dshawk/xlzcg

However it produces ice restart files which begin from 02 (February) even though the ice restart dump I used begins in 09 (September). This may have something to do with needing to reset using modify_CICE_header but this does not exist anymore since moving to the new Cray system and instead use the following fix as instructed from ticket #1667

UM: fcm:um-br/jwalton/vn8.2_NEMOCICE_restart_fixes_UKMO
CICE: fcm:cice-br/dev/jwalton/vn4.1m1_restart_date_fix_UKMO

Thought this may be a side issue. In any case I tried to rerun with the restart dumps produced by the run: /home/dshawk/startdumps/xlzcg/ xlzcga.da21011201_00 xlzcgo_21011201_restart_0000.nc anqdhi.restart.1999-12-01-00000

I use a different a CICE restart dump (not produced by the run) in order to maintain the same month (12) but then this gave me a different error, so I thought I better not mess things up too much and just try to rerun everything from scratch with new start dumps: /home/dshawk/startdumps/xlzcg/ xkhqaa.da23000901_00 xkhqao_23000901_restart.nc xkhqai.restart.2300-09-01-00000

but once again I have returned to the same problem in the .leave shown above.

Your help would be very much appreciated.

Cheers, Dill

#1748 answered My job has stopped as I have unexpectedly exceeded the time limit 10800 annette s.varma13
Description

Hello

I am having problems after I have submitted a run to Monsoon.

The job is xmbxa and is called “cloud reference run 2008 clouds 1” and I am running this on version 8.4.

I have a start date of 1 December 2006 with a run time of 2 years and 1 month. I have selected 16 diagnostics (a mixture of both 2d and 3d). My time profile is T1H which is every hour for the run time. My domain profile is either DALLH or DIAG depending on whether it is 2d or 3d. My usage profile is UPD, stream 60, override size of 32,000, period of 1 day (given the hourly output). Resubmission pattern is one month.

When I submit the run for the first time, it outputs 30 files for December 2006. When I resubmit to do the continuous runs, it stops on the 19th day of January 2007. You can see this here cd /projects/ukca-imp/suvar/xmbxa ls -ltr xmbxaa.pa20070119

In the .leave file it says I have exceeded the time limit of the job: /home/suvar/output/less xmbxa000.xmbxa.d15329.t204741.leave ⇒> PBS: job killed: walltime 10826 exceeded limit 10800 aprun: Apid 229471: Caught signal Terminated, sending to application Application 229471 is crashing. ATP analysis proceeding…

Something is causing the job to take a lot longer to run as it only completes 1 month and 19 days of a 2 years and 1 month run.

I tried to change the usage profile to 63 as this is the one usually associated with UPD but when I do that a window pops up saying “disk quota exceeded”. It is still happy with 60 which has the same packing profile.

I also thought I could reduce the resubmission pattern from one month to 15 days as the run stops at day 19 under “input/output Control and Resources - resubmission pattern” and the same disk quota exceeded window came up.

I also tried to turn off the climate meaning under “control – post processing – dumping and meaning” as I do not need this and the same window came up.

Many thanks in advance for your help

#1754 fixed Archiving failure annette dschro
Description

I can continue my old job on xcm now, but the archiving doesn't work — probably due to a wrong directory for the STASHmaster file. Any idea? See below or /home/dschro/output/xlttf000.xlttf.d15331.t220708.archive.leave

/projects/um1/vn8.0/ctldata/STASHmaster does not exist, should be vn8.6 instead of vn8.0.

Thanks, David

TMPDIR=JOBTEMP= /scratch/jtmp/pbs.321069.xcm00.x8z
The command to archive is : moo put -f -vv -c=umpp /projects/jsimp/dschro/xlttf/xlttfa.pe2014may moose:crum/xlttf/ape.pp/xlttfa.pe2014may.pp
Moose error: user-error (see Moose docs). Return code = 2
stdout: ### put, command-id=211678696, estimated-cost=604041216byte(s), files=1
### task-id=0, estimated-cost=604041216byte(s), resource=/projects/jsimp/dschro/xlttf/xlttfa.pe2014may
### 2015-11-30 12:35:02 GMT: polled server for ready tasks: #0
; stderr: /projects/jsimp/dschro/xlttf/xlttfa.pe2014may: (ERROR_CLIENT_FILE_FORMAT_CONVERTER) cannot convert file format.
////////////////////////////////////////////////////////////////////////
uk.gov.meto.moose.business.converter.exception.FileFormatException: STDOUT:
[INFO] Using precision: 32
[INFO] File(1): /projects/jsimp/dschro/xlttf/xlttfa.pe2014may
[WARN] Using default STASHmaster as none provided "/projects/um1/vn8.0/ctldata/STASHmaster".

STDERR:
+ dirname /opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc/umpp2pp.ksh
+ here=/opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc
+ [ /projects/jsimp/dschro/xlttf/xlttfa.pe2014may '==' -o ]
+ operational=''
+ input=/projects/jsimp/dschro/xlttf/xlttfa.pe2014may
+ output=/scratch/dschro/xlttfa.pe2014may3709185528395248730.tmp
+ . /opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc/platform_environ
+ dirname /opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc/umpp2pp.ksh
+ EXEC=/opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc
+ export EXEC
+ IEEE=/opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc/um-convieee
+ export IEEE
+ CONVPP=/opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc/um-convpp
+ export CONVPP
+ UMDIR=/projects/um1
+ export UMDIR
+ TMPDIR=/scratch/jtmp/pbs.321069.xcm00.x8z
+ export TMPDIR
+ MKTEMP=mktemp
+ export MKTEMP
+ ulimit -s unlimited
+ mktemp -d '--tmpdir=/scratch/dschro'
+ CONV_TMPDIR=/scratch/dschro/tmp.4tDmIiRanE
+ export CONV_TMPDIR
+ [ ! -d /scratch/dschro/tmp.4tDmIiRanE ]
+ rm /scratch/dschro/xlttfa.pe2014may3709185528395248730.tmp
+ 2> /dev/null
+ /opt/ukmo/mass/moose-monsoon-client-latest/bin/xc40-ffc/um-convieee -32 /projects/jsimp/dschro/xlttf/xlttfa.pe2014may /scratch/dschro/tmp.4tDmIiRanE/tmp.ieee
[FAIL] STASHMaster directory "/projects/um1/vn8.0/ctldata/STASHmaster" does not exist
[FAIL] Problem with CONVIEEE program
+ ERR=1
+ [ -d /scratch/dschro/tmp.4tDmIiRanE ]
+ rm -r /scratch/dschro/tmp.4tDmIiRanE
+ exit 1
: /projects/jsimp/dschro/xlttf/xlttfa.pe2014may
////////////////////////////////////////////////////////////////////////
put: failed (2)
Note: See TracQuery for help on using queries.