Opened 5 months ago
Closed 2 months ago
#3393 closed help (fixed)
Problems with adapting nudging code
Reported by: | jlgarcia | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | nudging, atmos_main |
Cc: | Platform: | ||
UM Version: | 10.9 |
Description
Hi,
My suite u-by341 has been failing at the atmos_main task and I can't find a way to fix it.
This suite stems from my suite u-bx823 which itself stems from u-as019 which is a standard GA7.1 suite for vn10.9. Note: u-bx823 runs fine.
I then copied the suite to u-by341 and tried to turn on the nudging in this suite and found that the nudging code was not adapted to nudge in specific regions bounded by latitude and longitude.
Matt Brown, in my group, has several branches for vn10.3 that do this (see e.g. branches/dev/mattbrown/vn10.3_um10.3_nudge_sep_regions@52914).
I attempted to adapt his code to the nudging code of u-bx823 in u-by341, and was able to compile and get the fcm_make task to work without error.
however, the atmos_main task fails and I don't know why. The error log file is full of messages like this one:
lib-4211 : UNRECOVERABLE library error A WRITE operation tried to write a record that was too long. Encountered during a sequential formatted WRITE to an internal file (character variable) an internal file (character variable)
And the final bit of the log file says:
ATP Stack walkback for Rank 425 starting: [empty]@0xffffffffffffffff um_main_@um_main.F90:20 um_shell_@um_shell.F90:652 u_model_4a_@u_model_4A.F90:370 atm_step_4a_@atm_step_4A.F90:5243 nudging_main1$nudging_main1_mod_@nudging_main1-nudging_main1.F90:591 nudging_netcdf_loader$nudging_main1$nudging_main1_mod_@nudging_main1-nudging_main1.F90:912 nudging_getfilename$nudging_filename_mod_@nudging_filename_mod.F90:131
I suppose there is a problem with nudging_filename_mod.F90 but I didn't touch that file, which makes me wonder if perhaps Matt's nudging implementation is incompatible with vn10.9 and it is best to just use a suite from the same UM version as his branches.
Could you advise on this?
All the best,
Change History (6)
comment:1 Changed 5 months ago by jeff
comment:2 Changed 5 months ago by jlgarcia
Hi Jeff,
Thanks!
Yes, the problem is with the strings of the corresponding files for the nudging data, in this case era-interim.
I am using the same string value for the nudging data at '/projects/ukca-admin/analyses/era-in' in other suites and it works fine.
Printing the three strings in the write statement shows the following lines:
/projects/ukca-admin/analyses/era-in ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@~@������^@^@^@^@^@^@^@^@/ecm-e40_1deg-model-levs_1989090100_all.nc /projects/ukca-admin/analyses/era-in ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@~@������^@^@^@^@^@^@^@^@/ecm-e40_1deg-model-levs_1989090106_all.nc
I don't really know if the @ are supposed to be there, perhaps the TRIM function is not working properly here. I am a bit lost.
comment:3 Changed 5 months ago by jeff
Hi
The directory path string is read from a namelist on PE0 and then broadcast to all the other PE's. As far as I can tell this all seems correct in the nudging code. The error happens when the correct string on PE0 is broadcast to the other PE's, I doesn't seem to send the last 32 characters and they are left as undefined NULL characters, why this happens I don't know maybe it's a problem with the Cray mpi library.
There is an easy fix, in file nudging_input_mod.F90 change these lines
my_nml % ndg_datapath = ndg_datapath ENDIF
to be
my_nml % ndg_datapath = ndg_datapath ELSE my_nml % ndg_datapath = ' ' END IF
This should get around the problem by setting the string to spaces before the string on PE0 is copied over it.
Jeff.
comment:4 Changed 5 months ago by pmcguire
- Summary changed from Problems with adapting nuding code to Problems with adapting nudging code
comment:5 Changed 3 months ago by jlgarcia
Thanks Jeff,
We tried this in the group and appears to have worked.
All the best.
comment:6 Changed 2 months ago by ros
- Resolution set to fixed
- Status changed from new to closed
Hi
If you look at file nudging_filename_mod.F90, line 131, you can see it's trying to use a write statement to copy three strings into another string. My guess is the strings are to long for the size of dataname1. You could print out the strings and see what is happening here.
As an aside the use of a write statement here is unnecessary and you could copy the strings directly (making sure they are the right size of course). This use of a write statement is usually used for copying integer or real values into a string.
Jeff.