Opened 12 years ago

Closed 11 years ago

#88 closed help (fixed)

Reconfiguration with new ancil file

Reported by: cbain Owned by: jeff
Component: UM Model Keywords:
Cc: Platform:
UM Version:

Description

Hello,

I am trying to run the model with a new ancil file:
/hpcx/devt/n02/n02-ncas/cbain/ancill_in/smow_changedBAIN
to replace the original:
/hpcx/home/n02/n02/umx/vn6.1/ancil/atmos/n320/qrclim.smow

… the things I have changed are:
(1) reconfigure and run from reconfiguration
(2) configured ancil files in
Atmosphere > Ancillary and Data Input > Climatologies > Soil Moisture and Snow depth AND Soil:VSMC

The job is xcqid on the umui at puma (under cbain), and on hpcx my error message is:

/hpcx/tmpchkpt/jtmp/l1f402.244310.0/tmp/modscr_xcqid/qsexecute: Executing dump reconfiguration program /hpcx/devt/n02/n02-ncas/cwang/UM_EXEC/reconf.exe

ATTENTION: 0031-408  32 tasks allocated by LoadLeveler, continuing...
 *****************************************************************************
 ERROR!!! in reconfiguration in routine Rcf_Ancil_Atmos
 Error Code:-  211
 Error Message:- REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

This looks like it has a problem with the header in the ancill file not agreeing with what is expected by the job… but when I look at both ancillary files in xconv they seem to have the same format. I created the new ancill file using a program adapted from some code from Chris Taylor (it's worked for him before).

Any advice on where to start?

Thanks,
Caroline

Change History (29)

comment:1 Changed 12 years ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to assigned

Hi Caroline

Looking at your job (xcqid) it seems in the Soil: VSMC, hydrological/thermal conductivity etc umui window you have specified file smow_changedBAIN, this is incorrect you need a qrparm.soil file here not a smow file. Fixing this should get rid of the error message.

Jeff.

comment:2 follow-up: Changed 12 years ago by cbain

Hi Jeff,

The qrparm.soil file I was using before was in /hpcx/work/n02/n02-hgem/soutten/ancil/global1 however the permissions have changed I think and I can't see if the file is still there. When I try to run with it, it crashed:
"Ancillary File does not exist.

File : /hpcx/home/n02/n02/umx/vn6.1/ancil/atmos/n320/qrparm.soil"

The other ancillary file bunch I know about is at /hpcx/home/n02/n02/umx/vn6.1/ancil/atmos/n320/ but there is only a qrparm.soil_igbp or qrparm.soil_igbp_sahara or qrparm.soil_sahara, which of these should be used?

Thanks,
Caroline

comment:3 in reply to: ↑ 2 Changed 12 years ago by jeff

Replying to cbain:

Hi Jeff,

The qrparm.soil file I was using before was in /hpcx/work/n02/n02-hgem/soutten/ancil/global1 however the permissions have changed I think and I can't see if the file is still there. When I try to run with it, it crashed:
"Ancillary File does not exist.

File : /hpcx/home/n02/n02/umx/vn6.1/ancil/atmos/n320/qrparm.soil"

The global1 file is still there and is readable but it is at the wrong resolution for your run so you need to use a n320 file.

The other ancillary file bunch I know about is at /hpcx/home/n02/n02/umx/vn6.1/ancil/atmos/n320/ but there is only a qrparm.soil_igbp or qrparm.soil_igbp_sahara or qrparm.soil_sahara, which of these should be used?

Are you sure you need to use one of these files? The only reason you need to use a soil file is if you need to change some fields in the dump, i.e. the fields that are configured in the umui.

I can't tell you which file to use that is up to you, see this document for some info on the files,

http://ncas-cms.nerc.ac.uk/component/option,com_docman/task,doc_download/gid,23/

I believe the igbp files are derived from a newer data set.

Jeff.

comment:4 Changed 12 years ago by cbain

Hi again,

Just to bring this enquiry up to date: The qrparm.soil is solved, I am now using the correct file and it works fine. The reason the job is crashing is because of the ancillary file I am using. I created it using a altered version of mkancil.f and although it looks fine in xconv, it is confusing the model in some way.

Therefore I hope to use another method of creating the ancillary file.

Many thanks,

Caroline

comment:5 Changed 12 years ago by cbain

Hi Jeff,

I have made a new ancil file using xancil, it looks right, has the correct timesteps etc, the permissions are fine, but the run is still crashing.

ancil file for jobs:

xcqid = /hpcx/devt/n02/n02-ncas/cbain/ancill_in/smow_changedBAIN

xcqig = /hpcx/devt/n02/n02-ncas/cbain/ancill_in/smowUNI_changedBAIN

The error message just says 'A file or device is not appropriate for the lseek system call'.

Please help, I thought I'd covered all the basis.

Caroline

comment:6 Changed 12 years ago by cbain

Hi again,

To update you more, the problem with the ancillary file is the date:

1)xancil produces ancil files with a year time of 2000 when it should be 0000.

2)The timestep number is wrong for some months (e.g. Feb 15th :t=44 and should be 45)

All of this is on hpcx in /hpcx/devt/n02/n02-ncas/cbain/ancill_in/. I have changed the permissions so you can see it and my xancil jobs are in the same directory. I am using xancil0.40 on hpcx, as I thought this would be the most updated version. If there is a difference between this xancil and the one you gave Stephen Pickering please advise me on which is the best one to use.

Thank you,
Caroline

comment:7 Changed 12 years ago by jeff

Hi Caroline

In the xancil panel for Soil Moisture and Snow Depth you have selected "Use dates from NetCDF file", which is why you don't have the dates you specified in Grid configuration. Select "Specify Soil Moisture and Snow Depth ancillary file dates" to use the dates you entered.

The 2 versions of xancil should be the same.

Jeff.

comment:8 Changed 12 years ago by cbain

Hi Jeff,

Thanks for that. It didn't work but getting closer.

There are some small gliches: For start date of ancillary file I want every month to be on the 15th, I have to enter 14 in the date box in xancil for this to happen (a glich but managable!).

Main problem is the ancil file from xancil creates an ancil file with all months in with 30 days only whereas normal ancil files (i.e. Chang's) seem to have the correct varying days in each month. I know this is a 'feature' of the climate model but it seems to knock the weather model over: 'Wrong calendar set in Ancillary File' is the error.

Also the button 'Define monthly mean to be in middle of month' seems to make no difference to anything!

Hope you can help,
Caroline

comment:9 follow-up: Changed 12 years ago by cbain

I've just spotted the 'Gregorian' button. But sadly the start date seems to be resetting to the 16th again despite what I put in. Is there a trick here? Could I phone you today please? I'd really like to get this sorted and it seems like it might be something we can fix quickly over the phone.

Caroline

comment:10 in reply to: ↑ 9 Changed 12 years ago by jeff

Replying to cbain:

I've just spotted the 'Gregorian' button. But sadly the start date seems to be resetting to the 16th again despite what I put in. Is there a trick here? Could I phone you today please? I'd really like to get this sorted and it seems like it might be something we can fix quickly over the phone.

Caroline

Why do you think the dates are for the 16th? I've looked at your file smowUNI_changedBAIN and all the dates seem to be the 15th to me.

Jeff.

comment:11 follow-up: Changed 12 years ago by cbain

There are still problems but yes, in the last few hours I sorted the problem with the times.

For future users, check:
calender, then type in correct start time (15/1/0000) also tick 'no' for
define mean to be in middle of month, then SAVE job before creating
ancillary file.

HOWEVER:::: The run is still crashing. There is no clear error as far as I can see only the .leave file starts with 'stty: tcgetattr: A specified file does not support the ioctl system call.' which seems strange but I've no idea what it means. I will send you my leave file directly.

Caroline

comment:12 in reply to: ↑ 11 Changed 12 years ago by jeff

Replying to cbain:

There are still problems but yes, in the last few hours I sorted the problem with the times.

For future users, check:
calender, then type in correct start time (15/1/0000) also tick 'no' for
define mean to be in middle of month, then SAVE job before creating
ancillary file.

There does seem to be a problem with using the time panel in Grid configuration, I'll look into it. If you use the time panel in Soil Moisture and Snow Depth and similar panels, that seems to work ok.

HOWEVER:::: The run is still crashing. There is no clear error as far as I can see only the .leave file starts with 'stty: tcgetattr: A specified file does not support the ioctl system call.' which seems strange but I've no idea what it means. I will send you my leave file directly.

I don't know why that error happens but it doesn't effect the run. Your problem is in this line

Row length is larger than maximum defined in AMAXSIZE

You need to increase the size of some of the parameters in amaxsize to match your model configuration. You can do this with an include file mod, i.e. one than ends in .mh. Looking at your job you don't seem to have any mods other than the PUM mods, is this correct?

Jeff.

comment:13 Changed 12 years ago by cbain

yes, this is correct. I am a novice with modsets. I don't understand why my model configuration is playing up as I've changed nothing in the umui accept this ancillary file. Is this an obvious error? Does it happen a lot when ancillary files are changed?!?

Thanks for your help,

Caroline

comment:14 Changed 12 years ago by jeff

If you look at include file $UMDIR/vn6.1/normal/include/amaxsize/amaxsize.h, which is the default used, you can see the max dimension of model is 548x325x100 but your domain is 640x481x50, so you need to change ROW_LENGTH_MAX,ROWS_MAX and HORIZ_DIM_MAX. Did you have an old experiment which used to run? If so what was its umui id?

Jeff.

comment:15 Changed 12 years ago by jeff

If you look at umui experiment xceqh, this is the standard Global (N320L50) run. This job has lots of mods which you may need. Perhaps you should base your job on this one, where did your job come from?

Jeff.

comment:16 Changed 12 years ago by cbain

Thanks Jeff,

I got the original job from Chang about a year ago as a basis 6.1 job. I've used it many times with various guises.

I just copied across xceqh and changed it so it had my stash (verified) and my new ancill files in. It has crashed, but again it isn't obvious to me where from the .leave file (do you have advice for searching these or is it just experience?).

I will run it again over night without my ancillary file in to see if this is what is crashing it.

Thanks,
Caroline

comment:17 Changed 12 years ago by cbain

I ran it with only one change (without my ancil file) and it worked. So the problem is the ancil file.

comment:18 Changed 12 years ago by cbain

Hi Jeff,

The problems with my ancil file are sorted but now the original run is not working - I haven't run it for a month and just tried to run a job I know worked and its crashed. Have there been any changes to files/positions etc in hpcx?

The job is xcqih. The leave file looks ok, not obvious what problem is. I'm running another job that worked last month to check whether its an ancil problem or a computer problem. Will get back to you if things become clearer!

Caroline

comment:19 Changed 12 years ago by cbain

I've checked the job and the difference between it running and not running is the soil moisture being configured or not - this doesn't make sense to me as I ran the same job in November and it worked, there is no obvious error in the leave file.

If you could take a look at it I'd really appreciate it: job is xcqih

Caroline

comment:20 Changed 12 years ago by jeff

Hi Caroline

Could you give me read permission on the files in directory /hpcx/devt/n02/n02-ncas/cbain/ancill_in on hpcx, thanks.

Jeff.

comment:21 Changed 12 years ago by jeff

Hi Caroline

The problem with your run looks to be with the smow ancillary file (smow_changedBAIN). The top level of soil moisture content looks fine but the other three levels aren't correct. The NetCDF file that was used as input to xancil (smow_joined.nc) also has the same problem.

Jeff.

comment:22 Changed 12 years ago by cbain

Hi Jeff,

Just to update you, my programs now run if I run only 1 day, if I try to run 2 days it crashes. Possible things:

1) leave file says many:

'Variable l_dust is misaligned. This may affect the efficiency of the code'

2) Wall time isn't long enough? I gave it 10800 to run, it takes 3600 for 1 day.

Jobs are xcqih, xcqii, xcqik and xcqil

Many thanks,
Caroline

comment:23 Changed 12 years ago by jeff

Hi Caroline

The last few jobs in directory /hpcx/devt/n02/n02-ncas/cbain/um/umui_out have failed because you have exceeded your disk quota on devt. You can see your quota either by using mmlsquota on hpcx or by logging into your web account on https://www.hpcx.ac.uk/.

Jeff.

comment:24 Changed 12 years ago by cbain

Hi Jeff,

Thanks for all the advice, everything is working and I am getting results back. Is there any way I can still run the model for 2 days but cut down on my computing needs so I can fit into the hour queue on hpcx? The runs have been taking ¾ days to get computing time and they only take just over an hour (I think).

cheers,
Caroline

comment:25 Changed 12 years ago by jeff

Hi Caroline

I can't read your latest jobs but earlier ones seem to take about an hour a day so a 2 day run would need about 2 hours. I notice you always recompile the model and reconfiguration this is not necessary if your not changing the source code. You can turn this off in the umui but it probably won't help with the queuing times, which are very long on hpcx at the moment. Are you using CRUNs or are all your runs standalone? If they are standalone I think all you can do is wait.

Jeff.

comment:26 follow-up: Changed 12 years ago by lois

Have you thought about using Chang Gui's CRUN hand edit to use automatic resubmission for your longer runs. So

  • do as Jeff advised, turn off reconfiguration and compilation, set the run length of your job to be 2 days (if that is what you need),
  • set the job submission time to 20mins or 1 hour whichever queu you think gets better through put. Make sure that you have pressed the NEXT button on the submission window to switch on automatic resubmission and you have set the same job submission time in this window. Set the resbmission period which is appropriate.
  • make sure that you have set dumping period approriately for your resubmission period
  • include Chang's handedit /home/sws00cw2/um/hand_edits/change_crun

To get better throughput you need to play the queues as well as all the other NCAS users hogging the HPCx queues.

Lois

comment:27 in reply to: ↑ 26 Changed 12 years ago by cbain

Thanks for the speedy advice, can you be more specific about what I need to turn off in the reconfig?
Is it Sub model independent > Compliations > Modifications for the reconfig > run from standard?

or Start Dump > using the reconfiguration
or something in the Reconfiguration option?

I thought I needed the reconfiguration to alter the start dump…. or can I replace start dump with the .astart file I created on my last run?

thanks,
Caroline

comment:28 Changed 12 years ago by lois

You need to do both
Use Sub model independent > Compliations > Modifications for the reconfig > run from standard
to stop the reconfiguration code being compiled and switch off

Start Dump > using the reconfiguration
and put in the .astart file that has been created as the start dump.

This way you should be doing no extra compilation or extra reconfiguration.

Lois

comment:29 Changed 11 years ago by jeff

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.