Opened 9 years ago

Closed 9 years ago

#773 closed help (fixed)

idealised run which fails during execution under phase3

Reported by: beh27 Owned by: willie
Component: UM Model Keywords: compiler optimisation
Cc: Platform:
UM Version: 7.5

Description

Hi,

I have an idealised UM job (xgdyp) which worked fine under phase2b, but is having some issues with phase3. I have implemented all the recommended changes and have recompiled the code for reconfiguration and execution from scratch and the code extract and executable build seems to work fine. The reconfiguration step also works fine. It is in the execution though that the model operates poorly - by the second time step the domain is filled with NaNs?. As I say, this job was working fine under phase2b. I've tried to sort this, have checked the idealised namelist file (/home/n02/n02/beh27/work/namelists/idealise_green) and this all seems fine, but it should be as it is no different to before. Do you have any idea what the problem might be?

Other points that might be worth knowing is that I am reconfiguring orography in the model (but this process seems to be working fine) and I have an FCM modification which I created (fcm:um_br/dev/beh27/VN7.5_exchangecoeffoff/src) which turns off surface fluxes of heat and momentum - does this need to be changed or redone for phase3?

Most recent .comp/.leave files can be found at:
/home/n02/n02/beh27/um/umui_out/xgdyp000.xgdyp.d12013.t173721

I have changed the permissions so you should be able to access the files I've mentioned… Let me know if you need and more info.

Thanks,

Ben (beh27)

Change History (8)

comment:1 Changed 9 years ago by beh27

Thought I'd add these pieces of info in case they prove useful…

# rechecking file xgdyp000.xgdyp.d12013.t173721.leave I found this 'STOP' early on which I haven't seen before in any of my outputs…

/work/n02/n02/beh27/xgdyp/bin/qsexecute: Executing setup

STOP

/work/n02/n02/beh27/xgdyp/bin/qssetup: Job terminated normally

/work/n02/n02/beh27/xgdyp/bin/qsexecute: Executing model run

# it appears that the NaNs? start cropping up after the running of 'Atmos_Physics2' - line 2550

# also, for comparison, xgdyp000.xgdyp.d11315.t164020.leave gives a .leave file for the last time this model ran correctly under phase2b

Thanks,

Ben

comment:2 Changed 9 years ago by willie

Hi Ben,

In your leave file it says

error halo_j too small 2

In the standard jobs Halo size for boundaries of PEs is 4. There are many differences between this job and the standard xfqca (or b, I don't know the source), and any of these could cause the problems you're seeing.

Regards,

Willie

comment:3 Changed 9 years ago by beh27

Hi Willie,

Thanks for the input. Yes, I see what you mean, but I'm reluctant to think this is the problem as I went back to previous runs when the model worked fine on phase2b and they all showed this error too. I could always go and create a new LBC file though and might do this if I cant find the problem elsewhere. Any other ideas? My thought is that it must have something to do with the implementation of some physics in the model as the model initialises fine, but then quickly destabilises. The fixed BCs are the only part of the model which shows data past the initialisation.

Thanks,

Ben

comment:4 Changed 9 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to assigned

Hi Ben,

If job xgdyo was the working one, you could use the UMUI to compare it with xgdyp, to see what the differences were. There have been no changes to the exchangecoeffoff branch for six months and you have asked for a full build, so this is unlikely to be a problem.

I am unable to comment on the physics/meteorology.

Regards,

Willie

comment:5 Changed 9 years ago by beh27

Hi Willie,

I have conducted further tests and think it has something to do with the executable generated when I include my FCM branch. For example, jobs xgdyb and xgdyd are identical except for one has my FCM branch compiling in to the executable. xgdyb runs fine and xgdyd behaves as described above, i.e. generation of NaNs? in first model step. Any ideas why this might be? The FCM branch was functioning really well under implementation in phase2b so I don't think its likely to be the fault of the specific code change, but correct me if I'm wrong on this.

In addition, I have managed to get a run going using an executable made under phase2b (job xgdyp). How has that been possible if a new executable is required under phase3? Is there any reason why I shouldn't just continue to use this executable? (The results of the phase3 run do differ every so slightly from the identical run from phase2b)

Thanks,

Ben

comment:6 Changed 9 years ago by willie

Hi Ben,

Since the only difference between the phase 3 jobs xgdyb and xgdyd is the exchangecoeffoff branch, this is the cause of the problem. However, since it worked in phase2b, it is the way the code is compiled that is the issue. The branch uses an optimisation of -O2. You should reduce this to -O0 by including a compiler override file with the following line

bld::tool::fflags::UM::atmosphere::boundary_layer %fflags64_mpp -O0

(Your code change modifies the file fcdch_sea.F90 which occurs in the boundary layer section).

The override file should be included in the UMUI page Compilation and Modifications > Um User override files and enter the filename, including the path, in the bottom table. You don't need the flux_rho override.

You then need to compile, build and run xgdyd.

As regards xgdyp/n, our advice is that executables built prior to the upgrade are not guaranteed to run on the new processors. Please make sure your models have been rebuilt.

I hope that helps.

Regards,

Willie

comment:7 Changed 9 years ago by beh27

Hi Willie,

Perfect! That works fine now. Thanks for taking the time to look in to this!

Best,

Ben

comment:8 Changed 9 years ago by willie

  • Keywords compiler optimisation added
  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.