Opened 3 months ago

Closed 5 weeks ago

#2872 closed error (answered)

Model crash with new piControl SST and seaice ancillaries

Reported by: eeac Owned by: jeff
Component: UM Model Keywords: piControl,SST, SIC, ancillaries, HadGEM3-A
Cc: Platform: ARCHER
UM Version: 8.4

Description (last modified by willie)

Hi,

This follows #2844 and seems quite similar to the problems of #1456.

I've been trying to set up a model run with pre-industrial conditions (at least for the SSTs and SICs part). I've created those myself by calculating the multi-model mean of all the CMIP5 available models as a long-term monthly climatology. My aim, is after setting up the piControl run successfully, to use it as starting point for some sensitivity simulations with different SST ancillaries.

You can see the piControl ancillaries located in /work/n02/n02/eeac/ancil/v1. I'm confident that I've created the ancillaries properly in terms of the attributes matching the job but you can also see the netcdf files used with xancil in the /work/n02/n02/eeac/ancil/nc_files.

There is only one discrepancy which is the date (set as 1850 in my files) with the start dump (2035) I'm using from the job I copied this configuration, but I can't see that being a problem.

The job id is xokfa and it runs successfully for a model day when I'm using the default climatologies such as qrclim.seaice/sst converted to 360-day calendar. As soon as I add my ancils it fails to converge on the first timestep indicating that the ancils are causing this. The following error can be found in the leave file xokfa000.xokfa.d19100.t175017.leave.

ATP Stack walkback for Rank 0 starting:
bi_linear_h_@bi_linear_h.f90:516
ATP Stack walkback for Rank 0 done
Process died with signal 11: 'Segmentation fault'
Forcing core dumps of ranks 0, 1, 37

There are a few warnings about for ancillary file 18 but this doesn't correspond to SST or SIC. There is also something weird about the output on the second timestep where the latitude and longitude values seem a bit odd.

Trying either of those with the default climatologies causes the model to crash possibly because of inconsistencies. I'm mostly worried about my SIC ancillary (only has sea ice fraction) and I'll be happy to try a piControl one from another run but I cannot find one!

Perhaps I might have missed a step when using xancil to create those ancillaries? Either way I cannot pinpoint which one of those or if both are causing this issue, so any help or suggestion would be greatly appreciated.

Cheers,
Andreas

Attachments (2)

sst2.png (53.6 KB) - added by jeff 6 weeks ago.
sst
sifrac.png (40.1 KB) - added by jeff 6 weeks ago.
sea ice

Download all attachments as: .zip

Change History (24)

comment:1 Changed 2 months ago by willie

  • Description modified (diff)

Hi Andreas,

It looks to me as if the wrong land/sea mask has been used to create /work/n02/n02/eeac/ancil/v1/pi.sst. Compare this with that standard one in /work/y07/y07/umshared/ancil/atmos/n96/sst/hadisst_6190/v1/qrclim.sst.

All the ancillaries need to be on the same land/sea mask.

Willie

comment:2 Changed 2 months ago by eeac

Hi Willie,

If it is the SST ancillary that has the problem then this issue might persist as the SIC was calculated using the same land-sea mask.

Not using a land-sea mask when calculating the nc files leads to unrealistic values, a problem which persists when using both options regarding land-sea mask on xancil (calculate land-sea mask based on missing values and don't calculate land-sea mask).

I would definitely need to find a correct way to calculate the SST ancillaries as my plan is to use various SST fields for the sensitivity simulations I have in mind.

What would you advise?

Cheers,
Andreas

comment:3 Changed 2 months ago by jeff

Hi Andreas

Looking at your SST netCDF file it seems to have very unrealistic values around the coastlines. I think you will need to fix this somehow before you can use these files in the UM.

When creating SST (and SIC) the normal method used when creating ancillary files is to extrapolate the data over missing data values before using them in xancil. This way you are guaranteed to get sensible values when using the UM land/sea mask. Xconv can do this as can other utilities. This won't work very well with your current files because of the unrealistic values.

When you create the ancillary files in xancil you can set the land sea mask to "Don't calculate land mask", this will produce odd looking files but they should work fine in the UM as the UM just reads points which it thinks are sea points. Alternatively you can select "Use land fraction NetCDF file to calculate land mask", this will create a mask based on points with any sea fraction in them, this is what the UM uses as the mask for sea only fields. You can use xconv to convert the model sea fraction file to netCDF and you would need to select "Use Land Fraction NetCDF variable" in the Land Fraction panel in xancil. The second method is probably better as you can see exactly what points the UM is going to read.

Jeff.

comment:4 Changed 2 months ago by eeac

Hi Jeff,

Thank for your comprehensive suggestions! While waiting for a reply, I went back to square 1 and did see that I was using a land mask based on a 1x1 resolution converted to N96. I realised this wasn't exactly right, so I created a new N96 land sea mask with a new script which in theory should be ok. These new netcdf files are located in /work/n02/n02/eeac/ancil/nc_files named as v3. If you have the time can you see if these seem at least a bit better?

I can easily calculate those SST/SIC nc files without a land-sea mask and create the ancils and set it to don't calculate land mask in xancil.

But I do have a couple of questions regarding the options in the SST and SIC panels if you just bear with me for a bit.

Do I change or set anything on the SST panel like enter minimum allowed SST value, or enter the SST value of SIC?
As for the SIC panel the only yes should be for the use of AMIPII ancillary Sea-Ice format or something else?

Cheers,
Andreas

comment:5 Changed 2 months ago by jeff

Hi Andreas

The new SST file looks the same as the old one to me. If you can create the files without a mask then that would definitely be the best idea.

As for you other question, it's been so long since I created the SST panel I'm not really sure why those options exist. It was probably how things were for the 4.5 version of the UM so whether they are still applicable I'm not sure. What you should do is look at the original SST file for your job and see if it does anything similar. For the SIC file then you probably only need the AMIPII option.

Jeff.

comment:6 Changed 2 months ago by eeac

Hi Jeff,

I appreciate your help, I'll try these couple of things and see if I can get the model to run with the changed ancillaries.

Cheers,
Andreas

comment:7 Changed 2 months ago by eeac

Hi,

After some playing around with the creating of the ancillaries and testing them in model runs I can report the following.

If I don't include a land mask when creating the nc files (/work/n02/n02/eeac/ancil/nc_files/ version 3 files) I do get some unrealistic but not missing values over land. Even if I extrapolate over missing values along with no land-sea mask use within xancil, I cannot get the model to run with a failure to converge on the first timestep.

For some reason which I cannot understand, the nc files with no land mask look sort of ok in the ocean but after creating the ancillaries the ancils (v3) apparently it has added 273 K at every grid point!?

The only thing that yielded a somewhat different error was to create the ancils (/work/n02/n02/eeac/ancil/v5/) with the use of a N96 land-mask when calculating them and subsequently use xconv to extrapolate over missing values. They do look a bit odd as Jeff said it would and this time I'm still getting a segmentation fault but with no failure to converge.

Details of the error can be found in ~/output/xokfa000.xokfa.d19101.t213842.leave while I cannot actually find in the core dumps any specific routine error.

ATP Stack walkback for Rank 12 starting:
  bi_linear_h_@bi_linear_h.f90:505
ATP Stack walkback for Rank 12 done
Process died with signal 11: 'Segmentation fault'
Forcing core dumps of ranks 12, 0, 1, 49
View application merged backtrace tree with: stat-view atpMergedBT.dot
You may need to: module load stat

_pmiu_daemon(SIGCHLD): [NID 04588] [c7-2c2s11n0] [Thu Apr 11 21:49:06 2019] PE RANK 119 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 04978] [c1-3c2s12n2] [Thu Apr 11 21:49:06 2019] PE RANK 132 exit signal Killed

I do realise that this problem arises form the ancillaries I'd like to use but I'm at loss in terms of what to do next or how to implement Jeff's alternative suggestion (with the use of "Use land fraction NetCDF file to calculate land mask" in xancil panel it is not clickable). In the xconv SST panel there is no option to create model sea fraction unless you were referring to the sea ice panel of xconv.

Cheers,
Andreas

comment:8 Changed 2 months ago by jeff

Hi Andreas

To answer the last question first, there was a slight error in my previous reply, "model sea fraction file" should read "model land fraction file", i.e. the UM ancillary file from your job. This file needs to be converted to netCDF format and used in xancil panel "Land Fraction", once you have selected "Use Land Fraction NetCDF variable" in this panel the "Use land fraction NetCDF file to calculate land mask" option should be usable.

It seems to me all the problems are being caused by your initial conversion to a n96 netCDF file, can I ask how you are doing this and what source data you are using?

Jeff.

comment:9 Changed 2 months ago by eeac

Hi Jeff,

I'll can give it a try by first locating the ancillary land fraction file from my UM job and then follow your instructions.

In terms of actually calculating the nc files, they come from all the CMIP5 available piControl runs (rergridded to N96 with CDO) -calculating with NCL the long-term monthly climatology of the MMM (>25 models) from the last 150 years of each and using the aforementioned N96 land-mask prior to creation of the MMM nc files. Since some of them might have either bad or unrealistic values somewhere, the MMM would even them out.

Andreas

comment:10 Changed 2 months ago by jeff

Hi Andreas

The correct mask to use for the SST/SIC is one generated from the land fractional file, i.e. landfrac == 1 => land, landfrac < 1 => sea. The N96 land mask is the opposite of this i.e. landfrac == 0 => sea, landfrac > 0 => land. Sea only fields use the former land only fields use the latter. Maybe this will help in generating the correct files.

Jeff.

comment:11 Changed 2 months ago by eeac

Hi Jeff,

I'll try this straight away with the landfrac file. I'm a bit confused though in terms which one the model actually uses.

In UMUI under Ancillary and Input > In file related > Ancillary version files I can see that the versions are located in /work/n02/n02/ukca/ancil_versions/versions_UM8.2_invert_rivers

In that file I can see that

 UM_ANCIL_MASK_DIR=$UM_ANCIL_N96ORCA1DIR/land_sea_mask/etop02/v0
 UM_ANCIL_LANDFRAC_DIR=$UM_ANCIL_N96ORCA1DIR/land_sea_mask/etop02/v0

This corresponds to /work/y07/y07/umshared/ancil/atmos/n96/orca1/land_sea_mask as far as I can see. But there are also the ancils located in /work/y07/y07/umshared/ancil/atmos/n96/land_sea_mask/igbp/v1 so the former are the ones my job actually reads, am I right?

And just to make sure that I'm not doing something in the wrong way, I should use my nc files with my own land-mask implemented when calculating the MMM (otherwise it's the problem I mentioned in last night's comment - xancil adds 273 K everywhere), then use the xconv option to extrapolate over missing values and use the job's land fraction converted nc file to finally create the SST/SIC ancillary.

Please correct me If I'm wrong anywhere!

Cheers,
Andreas

comment:12 Changed 2 months ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to accepted

Hi Andreas

Yes the files in land_sea_mask/etop02/v0 seem to be the ones used in your job.

The reason xancil adds 273.15 is, it checks the value of the first non-masked point and if it's between -150 and 150 then it assumes the temperature is in degC and converts it to K. The first point in your data has the value 0 so it get converted, also most of your land values have the value 91.05. These aren't sensible values so it's no wonder xancil gets a bit confused.

I don't know how you are generating these files but unless the field you input to xancil has sensible values everywhere that the UM thinks is a sea point (Using the land frac field how I described in my last message) it's not going to produce a good ancillary file. This includes having odd points around the coasts.

Maybe produce the SST fields without using a mask, then use the correct mask from the UM land frac file, if this looks ok then feed this to xancil.

Jeff.

comment:13 Changed 2 months ago by eeac

Thanks Jeff,

I appreciate your help once more. I'll keep this ticket updated with my progress. In fact using land fraction produces quite reasonable results in the ancil files (v6), so now fingers crossed the simulation will run successfully.

Cheers,
Andreas

comment:14 follow-up: Changed 2 months ago by eeac

Hi Jeff,

After some extensive testing and various combinations with sst/sic ancillaries created with landmask/landfrac or no landmask at all I can report the following.

Firstly, although my job seems to be using the etop02 landmask/landfrac this is actually not true as they correspond to UM 6.6 versions and are quite different compared to the igbp versions (UM vn8.0) which are the ones that are being used in my configuration (qrclim has the landmask of igbp in both sst and sic ancil).

Secondly, I've managed to create a working version of the SST ancil (/work/n02/n02/eeac/ancil/v7) when using the igbp landmask with xancil and this works when combined with both the sea ice qrclim climatology and James Pope's xlaya Poles Apart pre-industrial version sea-ice. So the problem lies with my sic ancillary file solely.

When using igbp landfrac to create any ancil it produces really weird results (just take a look at /work/n02/n02/eeac/ancil/v8/seaice_landfrac_igbp.ancil) but when using landmask instead (same dir as before-seaice_landmask_igbp.ancil) is seems quite similar to qrclim but it's not working. xlaya's sea ice ancillary (copied over to /work/n02/n02/eeac/ancil/jp_ancils/) has no landmask calculated so I tried doing this in my seaice ancil (seaice_nomask.ancil) but it does not work again without failing to converge but with segmentation fault. What would you think is the issue here?

Note that, I've replaced with zeros the really small positive but not zero sea-ice positive values (e.g in tropical pacific ocean ~10-7) in my new sea-ice nc file (/work/n02/n02/eeac/ancil/nc_files/sic_piControl.nc) thinking that this might have been the issue.

Any advice or suggestions is more than welcome!

Cheers,
Andreas

comment:15 in reply to: ↑ 14 Changed 2 months ago by jeff

Firstly, although my job seems to be using the etop02 landmask/landfrac this is actually not true as they correspond to UM 6.6 versions and are quite different compared to the igbp versions (UM vn8.0) which are the ones that are being used in my configuration (qrclim has the landmask of igbp in both sst and sic ancil).

I don't understand what you mean here, how is the job not using etop02, have you changed it?

Jeff.

comment:16 Changed 2 months ago by eeac

Hi Jeff,

I haven't changed anything and it seems conflicting that in UMUI in Ancillary version files I can see versions_UM8.2_invert_rivers is being read which correspond to the etop02.

UM_ANCIL_MASK_DIR=$UM_ANCIL_N96ORCA1DIR/land_sea_mask/etop02/v0
UM_ANCIL_LANDFRAC_DIR=$UM_ANCIL_N96ORCA1DIR/land_sea_mask/etop02/v0

This seems to be quite confusing as the successfully read qrclim climatology has the landmask of igbp (work/y07/y07/umshared/ancil/atmos/n96/land_sea_mask/igbp/v1) and not the etop02 as well as the fact that my SSTs with igpb landmask combined with qrclim seaice work fine.

Andreas

comment:17 follow-up: Changed 2 months ago by jeff

What do mean by qrclim?

Jeff.

comment:18 in reply to: ↑ 17 Changed 2 months ago by eeac

Replying to jeff:

What do mean by qrclim?

These are the standard ancils located in /work/y07/y07/umshared/ancil/atmos/n96/sst/hadisst_6190/v1/ for SST
and /work/y07/y07/umshared/ancil/atmos/n96/seaice/hadisst_6190/v1/ for SIC.

Andreas

comment:19 Changed 6 weeks ago by eeac

Hi Jeff,

I still have problems with this and after playing around with creating the SST/SIC ancillaries with the land-fraction of etop02 (see previous messages) and extrapolate over missing value date with xconv I'm now getting an abort error instead of a segmentation fault. I've tried my new SST/SIC combined with some other working ancillaries with the same error. It fails to converge on the 2nd timestep which indicates a problem with the ancillaries but I would guess that it is encouraging that it is not seg faults out.

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: glue_conv
? Error Code:     3
? Error Message: Mid conv went to the top of the model at point           18 in seg on call  1
? Error generated from processor:   139
? This run generated   8 warnings
????????????????????????????????????????????????????????????????????????????????

The error in the leave file (~/output/xokft000.xokft.d19134.t141348.leave) is

Rank 139 [Tue May 14 14:18:18 2019] [c0-0c0s10n1] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 139
Application 35197244 is crashing. ATP analysis proceeding...

ATP Stack walkback for Rank 139 starting:
  start_thread@pthread_create.c:301
  _new_slave_entry@0x16cd83b
  ni_conv_ctl__cray$mt$p0001@ni_conv_ctl.f90:2450
  glue_conv$glue_conv_mod_@glue_conv-gconv5a.f90:3688
  ereport64$ereport_mod_@ereport_mod.f90:107
  gc_abort_@gc_abort.F90:136
  mpl_abort_@mpl_abort.F90:43
  MPI_ABORT@0x15428ec
  PMPI_Abort@0x15612ec
  MPID_Abort@0x15891a1
  abort@abort.c:92
  raise@pt-raise.c:42
ATP Stack walkback for Rank 139 done
Process died with signal 6: 'Aborted'
Forcing core dumps of ranks 139, 24, 36, 0

What do you reckon?

Cheers,
Andreas

Changed 6 weeks ago by jeff

sst

Changed 6 weeks ago by jeff

sea ice

comment:20 Changed 6 weeks ago by jeff

Hi Andreas

I was looking at your output and ancillary files yesterday but they seem to be deleted now. I believe the sst and seaice ancils for the above job was /work/n02/n02/eeac/ancil/sst2 and /work/n02/n02/eeac/ancil/seaice2. I have attached images of them with the land masked out, as you can see there are unrealistic values around the coasts. I think this is the reason why the model crashes and you need to create files without these values.

Jeff.

comment:21 Changed 5 weeks ago by eeac

Hi Jeff,

I have deleted those files as I had found a solution to this problem by that time. Turns out, there were some weird values around the coasts in the seaice in the first month and they were not masked no matter what method I was using when creating the ancils. After figuring this out, my job was running OK for 6 model months but it subsequently failed at the turn of the 7th month, due to some unrealistic values in the SSTs.

I tackled those issues by rounding values to 2 decimal places for the seaice and taking a minimum of 271.4 K in my SSTs before feeding them into xancil to create the ancillaries. This way I didn't even have to use any landmask or landfraction when creating those ancils. In fact my ancils look quite similar to another set of pre-industrial SST/SIC ancillaries I got from a relevant job.

The model runs smoothly now, tested for a couple of years and the results look OK.

Feel free to close this ticket.

Best,
Andreas

comment:22 Changed 5 weeks ago by jeff

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.