Opened 9 years ago

Closed 8 years ago

#774 closed help (fixed)

Segmentation Fault in UM7.3

Reported by: zss21 Owned by: willie
Component: UM Model Keywords:
Cc: Platform:
UM Version: 7.3

Description

Hi,

I have a problem trying to run on phase3 a nudged N96L63 job that previously ran on phase2b. The job is xftbu and latest .leave file is
xftbu000.xftbu.d12019.t150728.leave

I have updated the overrides etc for phase3 and an additional override has been added to compile the netcdf libraries. The job now compiles okay but then crashes with the error:

_pmii_daemon(SIGCHLD): [NID 02126] [c7-0c2s7n2] [Thu Jan 19 15:50:15 2012] PE 240 exit signal Segmentation fault
[NID 02443] 2012-01-19 15:50:15 Apid 1572231: initiated application termination
diff: /work/n02/n02/zss21/tmp/tmp.hector-xe6-14.29233/xftbu.xhist: No such file or directory
qsexecute: Copying /work/n02/n02/zss21/um/xftbu/xftbu.thist to backup thist file /work/n02/n02/zss21/um/xftbu/xftbu.thist_keep
xftbu: Run failed

Thanks,

Zadie

Change History (8)

comment:1 Changed 9 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Zadie,

Could you give me read permission on the core file in /work/n02/n02/zss21/um/xftbu please?

Regards

Willie

comment:2 Changed 9 years ago by zss21

Hi Willie,

The file should now be readable,
Thanks,

Zadie

comment:3 Changed 9 years ago by willie

Hi Zadie,

The reconfiguration has succeeded, but the model run fails. We need to get more output, so try the following,

In Atmos > Science >Sect by Sect > Sect 13, Push the Diag_prn button and select "flush print buffers ..".

In Output Options switch on subroutine timers

In Compile options for model, select debug instead of high.

Then run again.

Regards

Willie

comment:4 Changed 9 years ago by zss21

Hi Willie,

Above changes made and the new .leave file is xftbu000.xftbu.d12020.t135333.leave,

Thanks,

Zadie

comment:5 Changed 9 years ago by willie

Hi Zadie,

Thanks. The debugger shows that it segmentation faults in the first atm_step at the call to nudging_main1. This can be caused by mismatching the argument list in the subroutine call and definition, or if one of the arguments is unallocated.

Regards,

Willie

comment:6 Changed 9 years ago by willie

Hi Zadie,

Another possibility, if you are sure that you have made no code changes, is that the Cray compiler we use in phase3 has over optimised. I am not sure precisely which section the nudging_main1 appears in, but we could reduce the optimisation. The atmosphere section uses an optimisation of -O2. You should reduce this to -O0 by including a compiler override file with the following line

bld::tool::fflags::UM::atmosphere %fflags64_mpp -O0

The override file should be included in the UMUI page Compilation and Modifications > Um User override files and enter the filename, including the path, in the bottom table.

You then need to compile, build and run.

Regards,

Willie

comment:7 Changed 9 years ago by zss21

Hi Willie,

Thanks for the suggestions. Unfortunately reducing the optimisation still leaves the same error so I will investigate the nudging call. I am currently testing a non-nudged version to confirm that the nudging is the cause of the problem,
Thanks,

Zadie

comment:8 Changed 8 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.