Opened 5 years ago

Closed 5 years ago

#1389 closed help (fixed)

Error in .leave file

Reported by: charlie Owned by: annette
Component: UM Model Keywords: anillary files, reconfiguration
Cc: Platform: ARCHER
UM Version: 6.6.3

Description

Hello again,

Sorry about this, but I have another problem when trying to run another of my jobs. I'm fairly sure the job has been set up correctly (it was copied from another of my jobs, which works okay), and it compiled the executable correctly, but when I tried to run, it only ran for about a second before falling over. I believe I have found the relevant error in the output file:

ERROR!!! in reconfiguration in routine Rcf_Ancil_Atmos
Error Code:- 409
Error Message:- REPLANCA: Current time precedes start time of data
Error generated from processor 0

although I appreciate this might be a red-herring. Certainly the start dates of my input data (start dump, SST ancillary, soil moisture ancillary, etc) do indeed match my start date - they all start in 1/1/1971.

What have I done wrong?

Thanks,

Charlie

Attachments (2)

xkmna000.xkmna.d14283.t124709.leave (297.4 KB) - added by charlie 5 years ago.
xkmna000.xkmna.d14300.t123415.leave (258.0 KB) - added by charlie 5 years ago.

Download all attachments as: .zip

Change History (30)

Changed 5 years ago by charlie

comment:1 Changed 5 years ago by charlie

Further to my last message: I think I have now identified the reason for the error, but don't know what to do about it.

The most important difference between this job and my others (from which it was copied) is that in this job, for the soil moisture / snow depth ancillary file I am updating soil moisture every day but configuring snow depth. In my other jobs, which work, I am updating both every day.

If I run my new job with daily updating for both of these, it works and runs fine.

So why is it not running when I am trying to only update soil moisture, and configure snow? Ideally this is what I want to do.

Thanks a lot,

Charlie

comment:2 Changed 5 years ago by charlie

Sorry to bother you again, but I was just wondering if someone had had time to look at my last question? So why can I not update soil moisture, but configure snow - I realise they are part of the same ancillary file, but they are different fields are they not?

Many thanks,

Charlie

comment:3 Changed 5 years ago by annette

  • Owner changed from um_support to annette
  • Status changed from new to assigned

Hi Charlie,

I will take a look at this and get back to you later. Sorry for the slow response we are a bit thin on the ground this week.

Annette

comment:4 Changed 5 years ago by annette

Hi Charlie,

I think I can see what is going on.

Your ancillary file for soil moisture and snow depth: smow_jules_1971-2004 contains time series for both fields. These are daily means time stamped for the middle of the day as the UM expects (eg 1971/01/01-12:00), so this works fine when both fields are to be updated, as you have found.

When configuring a field the UM reconfiguration will either i) accept a single time field in which case the date is ignored (as with many of the other fields), or ii) interpolate between a time-series to derive a value for the start date. However, since the time series starts from 1971/01/01-12:00 and the run starts at 00:00, the interpolation fails and produces the error you have seen.

I can think of a few ways to fix this:

  • Edit the ancillary file so that snow depth only has a single time field.
  • Start your run at 1971/01/01-12:00 instead.
  • If you really need to start at 00:00 then you could do a two-stage reconfiguration:
    1. Change your start date to 12:00 and run the reconfiguration only. Check the start file produced looks OK and includes your snow depth field.
    2. Change the date back to 00:00, add in the new start file, and switch snow depth to 'not used'. Then set the job to reconfigure and run.

Let me know if you have any further issues.

Annette

comment:5 Changed 5 years ago by annette

  • Keywords anillary files, reconfiguration added

comment:6 Changed 5 years ago by annette

If you do decide do try the two-stage reconfiguration, it might be best to two steps 1 and 2 under separate jobs so that it is traceable where your start file came from.

Annette

comment:7 Changed 5 years ago by charlie

Annette,

Thanks very much, that's great. I think I will try your first way, i.e. making the snow have a single time field.

Quick question: is there anything extra I need to do in xancil to do this? When I tried to do something similar before (combine 2 netcdf files where the soil moisture and snow depth were different time lengths), xancil gave me an error.

Charlie

comment:8 Changed 5 years ago by charlie

Annette,

Further to my last message: as I feared, I have just tried making my new ancillary file (using only a single time field for snow) and have received the error below. I can only assume this is because the time fields for soil moisture and snow are different, so xancil is giving an error because they are both used and combined to make the ancillary file.

What should I do to get round this?

Charlie

—-

Creating namelist file /fs2/n02/n02/cjrw09/ancil/hydro.d/xancil.namelist
Running mkancil executable /fs2/n02/n02/hum/bin/mkancil0.52
Output from /fs2/n02/n02/hum/bin/mkancil0.52 executable:

Writing Soil moisture and snow depth ancillary file
/work/n02/n02/cjrw09/ancil/hydro.d/xx
ERROR: in procedure get_ncdata_r 2 : NetCDF error number -40 :
NetCDF: Index exceeds dimension bound

comment:9 Changed 5 years ago by annette

Hi Charlie,

If it won't work in xancil, then I'd suggest using the reconfiguration as I mentioned above. Give it a go and if you get stuck I can help.

Annette

comment:10 Changed 5 years ago by charlie

Annette,

Sorry for the delay on this.

I have now tried doing what you suggested, using the reconfiguration in 2 steps as you are advised. Step 1 seemed to work okay, with the reconfiguration working and creating my start file xkmna.astart, however step 2 only ran for 5 seconds before falling over. I've attached the resulting .leave file, where I think I can see the error again but have no idea what it means.

I have a suspicion however. When I checked my start file at /work/n02/n02/cjrw09/xkmna/xkmna.astart although it contains both a soil moisture and snow depth field, neither looks okay. The soil moisture field just contains stupidly small values (take a look at the first level, which is clearly wrong). Likewise snow depth looks very wrong as well, again containing silly values.

Is this likely to be related to the error? Even if it isn't, what's going wrong?

Can you help?

Thanks,

Charlie

Changed 5 years ago by charlie

comment:11 Changed 5 years ago by charlie

Sorry to bother you again Annette, but did you have a moment to look at my problem?

Charlie

comment:12 Changed 5 years ago by annette

Hi Charlie,

Sorry for the delay - I was out of the office yesterday. I'm looking at your issue now, so I'll give you an update soon.

Best wishes,
Annette

comment:13 Changed 5 years ago by annette

Hi Charlie,

I have created a start file on Archer for you:

/work/n02/n02/annette/um/xhnkh/xhnkh.astart

This has the snow and soil moisture fields configured in and has the correct start time of 1971/01/01-00:00.

You should be able to run from this dump now without reconfiguring further. Switch snow to "not used" and soil moisture to "updated". You don't need the force_sm_recon.ed hand-edit.

As discussed earlier I did two reconfigurations.

  1. (xhnkg) I took a copy of your job, and set it to "reconfiguration only" with both snow and soil moisture configured. In order to force the recon to read the first time field from the ancillary, I changed the model start date to 1971/01/01-1200. To change anything other than the start year, however, requires a hand edit:
    /home/annette/um/hand_edits/set_to_midday.ed
    
  1. (xhnkh) I then made a new job, and reconfigured from the new start file. In order to reset the date back to 00:00, I added the following hand-edit:
    /home/annette/um/hand_edits/set_to_midnight.ed
    

Hopefully this will work for you now.

Best wishes,
Annette

comment:14 Changed 5 years ago by charlie

Annette,

Very many thanks. Just to clarify: does any of what you have done interfere with the rest of my job set up (with its various branch modifications and other hand edits)? The purpose of this job is to basically update the soil moisture from an external source (which I specify), updating only over a certain region in northern India and configuring everywhere else. To do this, I have another ancillary file (specified as a single level user ancillary file) which is basically a mask telling the model where to update and where to configure - I then do a couple of branch modifications. Will this all still be okay?

I'm pretty sure the answer is yes, because I have just checked the start file you created and it looks okay - the snow field looks normal, and the soil moisture field looks exactly as I would expect (with values over northern India but not elsewhere). It also contains other field, as it should, called "CO surf emissions" which is actually my mask. So I think it's all okay.

So do I just need to point my job to your new start file, and run as usual?

Charlie

comment:15 Changed 5 years ago by annette

Charlie,

The reconfiguration jobs that I ran were based on copies of your job with all the same branches and ancillaries etc. I did recompile the reconfiguration but that was probably unnecessary. The only things I changed were the start dates, snow and soil moisture settings, reconfiguration settings and switching off the force_sm_recon.ed hand-edit.

Therefore what I have done should be the same as reconfiguring at the start of your job. So I assume you can just run as usual now.

Annette

comment:16 Changed 5 years ago by charlie

Thanks very much.

Just 2 more very quick questions:

Firstly, what exactly does force_sm_recon.ed do? You say you have switched it off, but it was one of the hand edits I needed to add in originally in order to carry out this type of experiment (i.e. updating the soil moisture over a certain region and configuring elsewhere). Does switching it off matter?

Secondly, if I wanted to do what you did (to create my own start file), would I need to copy your 2 hand edits to my relevant directory and then just repeat your process?

Charlie

comment:17 Changed 5 years ago by annette

Hi Charlie,

After some digging I think I understand what is going on…

The hand-edit removes the soil moisture field from the list of ancillary fields specified to the reconfiguration. So it should have the same effect as setting the field to be 'not used' and the field will be copied from the dump. I have just run a test with and without the hand-edit to see exactly what happens.

  1. Without the hand-edit, when the field is set to be updated, the soil moisture goes through some sort of processing or resetting by the reconfiguration, which gives small values everywhere (between 0 and 2.6), as in this dump:
    /work/n02/n02/annette/um/xhnki/xhnki.astart.SAVE
    
  1. With the hand-edit, however, the field is copied more-or-less directly from the dump with values in the range 0 to 48.3, as in this dump:
    /work/n02/n02/annette/um/xhnki/xhnki.astart
    
    You can see the difference to the original start file here:
    /work/n02/n02/annette/um/xhnki/cumf_summ.annette.d14308.t143237.14968
    

The start file that I created, however, just copies the ancillary values and has small values outside your defined region:

/work/n02/n02/annette/um/xhnkh/xhnkh.astart

So… I think I misunderstood what you want to do here. I /think/ you need to have the full soil moisture field from the original dump in your start file (which the hand-edit would suggest). And then your ancillary data will be applied by the model. The code in your branch to update the field in the masked region applies to the model REPLANCA routine, and not to the reconfiguration REPLANCA routine (which is held in a different file).

Is this right? In which case I will redo your start file…

I hope this makes sense!

Best wishes,
Annette

comment:18 Changed 5 years ago by charlie

Annette,

Sorry for the delay, and sorry this has been so convoluted - I think you are right, but maybe I should explain exactly what I have done so far, as briefly as possible!

As I said, in all my experiments so far I have been using an external soil moisture ancillary file (which also contains snow). In all my experiments, I have been updating snow everywhere. Snow isn't what I care about! It's the soil moisture - so in my experiments, I have been updating the soil moisture over a given region and configuring everywhere else (and vice versa). To do this, I have been following instructions given to me by Ruth Comer at the Met Office, who has done something similar.

In short, I create a new ancillary file which is nothing more than a mask, in which 1 = grid cells where I want to update soil moisture and 0 = grid cells where I want to configure it. I add this into my job as a User single level ancillary file, and it is selected as "Not used". The soil moisture ancillary file from which I want to update from is added into the snow and soil moisture ancillary window, and is selected as "Updated". They are both at

/work/n02/n02/cjrw09/ancil/hydro.d

as are my masks.

I add the hand edit force_sm_recon.ed to the relevant location, as well as the userstash master file userstash_reg_msk.dat again to the relevant location. I then create a new branch, adding in to_land_points_msk.F90 to

src/control/packing_tools

of my new branch as well as modifying

/src/control/ancillaries/replanca-rpanca1a.F90

in my new branch.

So, and my understanding is a little limited here, as you say force_sm_recon.ed removes soil moisture from the list of ancillary fields forcing it to configure from the start dump even though updating from the ancillary is turned on within the umui. Then, with it configuring everywhere, it then uses replanca-rpanca1a.F90 (which calls to_land_points_msk.F90) to make the model update the soil moisture in the locations specified by the mask.

Does that make any sense? All of the above works absolutely fine when snow is also set to update, everywhere. The problem has only come about because I want the snow to configure everywhere, whilst still doing the above.

So I think what you say at the end is correct. Certainly, having looked at your 2 versions of the start file, the later version (i.e. the one containing values ranging from 0-48) is what I would expect the values to be. So yes, I think the start file needs to be created including the hand edit, so that the field is taken from the dump and is then configured from then on, apart from where the mask tells it to update from the ancillary file.

Best wishes,

Charlie

comment:19 Changed 5 years ago by annette

Hi Charlie,

That all makes sense and tallies with what I think is going on from looking at your job!

I have a new start file for you then - which has the snow configured and soil moisture from the original dump:

/work/n02/n02/annette/um/xhnkg/xhnkg.astart

There is actually a simpler way to get around the start date mismatch issue, with no need for my hand-edits.

  1. Modify the date of the dump using the change_dump_date tool as follows:
    change_dump_date <start-dump> 1971 1 1 12 0 0
    
  1. Run the reconfiguration ONLY with the snow configured, soil moisture updated, and the force_sm_recon.ed hand-edit included (as in your original job). The dump date will now match the date of the ancillary data though, so this should work.
  1. Change the date back again:
    change_dump_date <new-dump> 1971 1 1 0 0 0
    
  1. Switch reconf OFF and set to run from new dump.

Annette

comment:20 Changed 5 years ago by charlie

Excellent, very many thanks. I don't think I can try running my job with your new start file at the moment, because the system appears to be down.

If I wanted to repeat your process, starting with a new job, to create my own start file (which I probably should), can you just clarify/confirm the process? I'm getting a little bit confused between what you call a start dump and what you call a start file. Am I right in thinking the dumps are the original start dumps, currently at

/work/n02/n02/cjrw09/dumps

and are named, for example, xhgzha.dah1110 for 1 January 1971? In contrast, the start file is what is created when I begin running my job, named JOBID.astart? Is that right? So, if I understand it correctly, I would need to do the following:

1) Copy my existing job

2) Create the model and reconfiguration executable, by setting

Sub-Model independent > Compilation and modifications > Compile options for the model

to "Compile and build the executable named below, then stop", and

Sub-Model independent > Compilation and modifications > Modification for the reconfiguration

to "Compile and build the executable named below"

3) Run this, so that it builds the 2 executables. Once this has run, switch these 2 options back to "Run from existing executable, named below"

4) Modify my original start dump (currently dated at midnight on 1 January 1971) using your tool

change_dump_date /work/n02/n02/cjrw09/dumps/xhgzha.dah1110 1971 1 1 12 0 0

5) Run my job again (using the above start dump) but with the "Perform the reconfiguration step only" box ticked. At this stage, the snow should be configured, the soil moisture should be updated and the hand edit should be included. This will run for a few minutes, and will generate a new start file called /work/n02/n02/cjrw09/JOBID/JOBID.astart which will be dated midday on 1 January 1971 to match the modified start dump

6) For this new start file, change the date back to midnight using your tool

change_dump_date /work/n02/n02/cjrw09/JOBID/JOBID.astart 1971 1 1 0 0 0

7) Switch off the "Perform the reconfiguration step only" box, replace my start dump with this new start file, JOBID.astart, and run

Is that all correct? If so, 2 quick questions: firstly do I need to turn snow to "Not used" before running properly (i.e. at step 7)? Secondly, my original start dump, xhgzha.dah1110, is now dated midday on 1 January 1971, so once I have done all of the above should I change this back (i.e. for future use)?

Sorry to be so basic, but I want to get this right…

Charlie

comment:21 Changed 5 years ago by annette

Hi Charlie,

People tend to use the terms "dumps", "start dumps", "start files", "restart dumps", "restart files", and "restarts" interchangeably, but they really refer to the same thing.

The "dumps" produced by the model at checkpoint times, named for example, xhgzha.dah1110, contain the model prognostics, plus any diagnostics specified in STASH. These files can be used to restart the model during a CRUN, to extend the run beyond the original time-length, or to start a new job.

The "(re)start" "files/dumps" set in the UMUI have exactly the same file format as the model output dumps. And you can start from a file such as xhgzha.dah1110, without reconfiguring, as long as the dump has the correct model prognostics. These need to be on the correct grid, have the correct land-mask, may need to meet certain consistency requirements, and be at the correct model date. Any non-prognostic (ie Section 0 fields) will be ignored.

The reconfiguration therefore is used to prepare a dump to start a new model run. It will strip out any unnecessary prognostics, and ensure any required fields are included. You can also use it to update fields if required. The reconfiguration by convention produces a .astart file.

Hopefully this makes sense. I'll deal with your other questions as a separate comment.

Annette

Last edited 5 years ago by annette (previous) (diff)

comment:22 Changed 5 years ago by annette

To answer the second part of your message…

All the steps you have listed seem correct.

You should set snow to be not used at Step 7, but it shouldn't matter because you should not run the reconfiguration again. That is, when you add in JOBID.astart as your new start file, switch off the "Using the reconfiguration" box.

Also I agree it would make sense to change xhgzha.dah1110 back to midnight for future use.

Annette

comment:23 Changed 5 years ago by charlie

Thanks very much indeed, that's all understood. I've just submitted my job using your start file, so if that runs okay I will repeat the process and create my own start file. If it doesn't run okay, I'll let you know! It might be best to perhaps wait a bit before closing this ticket, in case it doesn't run.

Thanks again,

Charlie

comment:24 Changed 5 years ago by charlie

Dear Annette,

Just a quick note to let you know that I have now repeated your process, submitted my job and it appears to be running fine.

I wasn't sure whether something hadn't worked properly, because when I used your change_dump_date tool to change my start dump to midday e.g.

change_dump_date /work/n02/n02/cjrw09/dumps/xhgzha.dah1110 1971 1 1 12 0 0

when I checked it afterwards using xconv, the fields were still dated 00:00 on that day. Is this expected? I suppose I was expecting all of the fields to read 12:00 after using your tool.

Charlie

comment:25 Changed 5 years ago by annette

Hi Charlie,

According to Jeff Cole who wrote the tool, that is normal. Only the date in the file header is changed, not the dates associated with the fields which is why the old date shows up in xconv. The header date is what is used by the UM as the "model time", however.

Annette

comment:26 Changed 5 years ago by charlie

Thanks Annette, that's great.

Thanks again for all your help.

Charlie

comment:27 Changed 5 years ago by charlie

Hello Annette,

Sorry to bother you again - it was all going so well!

My job that we discussed a few weeks ago has been running, but has got to a certain point (February 1978, to be exact) and won't go any further. I have tried restarting from several of my start dumps, e.g. 1 February 1978 and 1 January 1978, but it's always gets to the same point, then stops.

I don't understand what's gone wrong. I have looked at the .leave file (attached) and although I can see errors, I don't know which one is the important one.

One of the reasons reason I don't understand this problem is that I am currently running 4 jobs at once: xkmna-d. These correspond to 4 ensemble members of the same job, so they are absolutely identical apart from the initial start file used - they all start in 1971, but xkmna reconfigures the 1971 start file, b the 1972 start file, etc. The problem is only occurring with xkmna - all of the other jobs have got past February 1978 and are running fine. So given that they are all identical, why are the others working but not xkmna?

Thanks a lot,

Charlie

comment:28 Changed 5 years ago by annette

  • Resolution set to fixed
  • Status changed from assigned to closed

This last comment has been moved to a new ticket #1414

Annette

Note: See TracTickets for help on using tickets.